Jan 19 2008

Fast-access dictionary for Boggle

Published by Dougal at 7:25 pm under Maths & Computer Science, Programming

I’ve spent a fair bit of time thinking about Boggle, so it’s time to act! At this point all my readers are going to bail out, because I start putting code on screen. If you’re not afraid of a bit of Haskell, step on through.

The retrieval tree

The dictionary is a trie, mapping alphabetical characters to sub-tries. Each sub-trie contains all the valid letters which follow the previous one. There is also a boolean flag to indicate whether that character represents the end of a valid word on its own.

newtype Pair = P (Bool, Trie) deriving (Show, Eq)
data Trie = T (M.Map Char Pair)
            deriving (Show,Eq)

I started off with the Pair type as a primitive tuple but for some reason it wasn’t doing what I expected. Giving the tuple its own name helped me to address the problem.

Building tries from smaller tries

Tries are manipulated as monoids, ie they have a binary operator (the plus) and an identity (the zero). Using the typeclass really helped to simplify this aspect of the programming because I didn’t have to think about what types I was manipulating. The compiler inferred all that on its own.

class Monoid m where
    zero :: m
    plus :: m -> m -> m
 
instance Monoid Bool where
    zero = False
    plus = (||)
 
instance (Monoid m, Monoid n) => Monoid (m,n) where
    zero = (zero, zero)
    plus (m1,n1) (m2,n2)= (m1 `plus` m2, n1 `plus` n2)
 
instance Monoid Pair where
    zero = P (zero,zero)
    plus (P l) (P r) = P (l `plus` r)
 
instance Monoid [a] where
    zero = []
    plus = (++)
 
instance (Ord k, Monoid v) => Monoid (M.Map k v) where
    zero = M.empty
    plus = M.unionWith plus
 
instance Monoid Trie where
    zero = T zero
    plus (T d1) (T d2) = T (d1 `plus` d2)

The implementations of plus and zero for each instance were chosen to suit this problem. So they’re not assured to be mathematically coherent — I’m using the stated algebraic structures as “a reference point rather than a shackle” as Terry Pratchett would put it.

Simple construction and deconstruction

I won’t bother printing out the full listing for the next step as it would just be really boring. So I made a few more functions to convert whole strings, and lists of strings, into tries.

fromString :: String -> Trie
fromString s = makeword (\c t -> unit c True t) s zero
 
fromStrings :: [String] -> Trie
fromStrings = join . map fromString

Then there are functions to do the reverse. I need these when I want to print out the answers:

extractword :: Trie -> [Maybe Char]
extractword t = if islastchar c t then [Just c]
                else if t'==zero then [Nothing] else Just c  : extractword t'
    where c = head $ keys t
          t'= child c t
 
toString :: Trie -> Maybe String
toString t | t == zero = Nothing
           | otherwise = sequence (extractword t)
 
toStrings :: Trie -> [String]
toStrings = mapMaybe (toString) . decompose

The function decompose is the reverse of plus — it partitions objects into constituent parts. A decomposition of an N-ary tree is a list of single-path trees, one for each of the paths in the original.

Extraction of words from tries is a bit tricky, because there are a number of possible outcomes.

  1. A sequence of characters may form a complete word. If so, we stop looking further.
  2. A sequence of characters may not form a complete word, but there could be further letters in the sequence. In which case, we delve further, appending letters as we go.
  3. If a sequence of characters doesn’t form a complete word and there are no more letters after that then we’ve basically come across a dud prefix. All the letters collected so far are invalid, so we abandon the whole thing.

Thankfully the Maybe monad is pretty good for this. If every single stage of the lookup produces a valid character then we get a whole word out at the end. If there is a fault at any stage then the whole thing is aborted and we skip to the next one. Without using any gotos. ;-)

Properties of construction/destruction

There are some nice identities encapsulated in these functions. For example, putting a word into a trie and then pulling it out is the return in the Maybe monad. Putting in a list then pulling it out again is an alphabetical sort over that list.

> -- just like `return "boggle" :: Maybe String`
> (toString . fromString) "boggle"
Just "boggle"
 
> -- just like `sort ["boggle","words","dictionary"]`
> (toStrings . fromStrings) ["boggle","words","dictionary"]
["boggle","dictionary","words"]

I also implemented some set operations on tries, where union is essentially the same as the plus we have already seen. Its partner, intersect, is a bit different because it’s not a proper intersection. Again, I fiddled with the definition for boolean intersection so that it’s identical to the union. I will probably go to Set Theory hell for this but it made the implementation shorter.

Conclusions

I was really pleased with how simple the typeclass approach to programming is. All operations on superstructures are just the same operations on their substructures, right down to the primitives. It almost became mechanical to implement once the primitives were in place, which was surprising.

Next I’ll look at creating the game board and scanning it for candidate words. Tune in next time!

One Response to “Fast-access dictionary for Boggle”

  1. L33tminionon 18 Feb 2008 at 5:36 am

    Interesting. Seeing your implementation made me look up whether the Haskell libraries have an implementation of Monoid (which of course they do, Data.Monoid). That implements most of the things you have in your typeclass. The standard library version has different function names (“mempty” and “mappend”) and two implementations for Monoid Bools (Any and All).

    Anyways, I’m just starting to learn Haskell, so writeups like this really help me start to wrap my head around it. Thanks!

Trackback URI | Comments RSS

Leave a Reply