Jan 15 2008

Solving the problem of Boggle

Published by Dougal at 9:16 pm under Maths & Computer Science, Programming

First, solve the problem. Then, write the code.

— John Johnson

A few days I looked at the game of Boggle with a view to writing a program to solve the same problem that the user has: find as many (long) words as possible from a given grid of letters. I didn’t think much about how to implement it, just how to decompose the game into several discrete sub-problems.

The next stage is to think about how the problem might be solved conceptually — what approaches to take to structuring data, and how this might lend itself to an elegant solution.

Storing Words, Querying Words

The game of Boggle revolves around letters and words. Any approach to the problem has to deal with words efficiently.

We need access to a complete word list of all the valid words, the ‘dictionary’. You can’t win the game unless you have words from this list. A lot of words will be prefixes of other words, so it would be wasteful to store them all separately: “he”, “heat”, “heath”, “heather”.

From this, we know that if we find a long sequence of characters on the Boggle board (such as “heather”) we’ll automatically have all the prefixes of that word — “heat” and “heath”. So it seems that the key to an efficient solution for this aspect of the problem revolves around word prefixes.

Photo: The entry for "life" in a dictionary.

Pruning the search tree

Further, we know that the set of real words that can be made from a Boggle board is actually quite small compared with the possible combinations of letters. Many candidate words will be meaningless because they start with illegal combinations of letters. No matter what you add to the end of “zdfg” you’ll never find it in the dictionary. Neither “zdfgreet” nor “zdfgown” nor “zdfgrateful” are real words, even though they end in perfectly legitimate ways. So it’s safe to say that all the time looking for suffixes which make “zdfg” into a valid word will be fruitless.

In that case, we really need to pay attention to the Boggle board but keep one eye on the dictionary of real words too, so that we can instantly prune all the useless prefixes we come across.

Between these two important facets — short words are prefixes of long ones, and searches must be restricted to valid prefixes — it’s plain that prefixes are key to solving this problem. The prefix tree is the common way to store sequences of characters with common prefixes.

Generating the possibilities

Given a list of words stored on disk, we can easily convert this into a prefix tree which holds our valid dictionary. We will compare the candidate strings of letters with the real words. But first we need to produce the candidates. This isn’t so difficult, since the rules are well defined. If we imagine a grid with a letter at each point in the grid, then all the letters will have co-ordinates — an x and y pair. From any single point we can easily determine which way to move next — moving up, down, left or right is a matter of adding or subtracting one from the x or y values.

Once a spot has been visited it can’t be used again — each letter can only appear once per word. We have to mark each grid point as we use it so that we don’t go back to it. But in actual fact, it’s much easier to keep a list of unvisited grid references, and delete each one from the list as we go.

The processes of generating new words and then pruning invalid ones go hand in hand. At each stage, we want to look at our possibilities, but remove the ones we know will lead to invalid words. Say we have the sequence “cardboar”. The neighbouring letters to the final ‘r’ are ‘t’, ‘w’ and ‘d’. If we check the dictionary with the prefix of “cardboar” we find only one possible alternative, ‘d’. So the others can be instantly discarded.

In fact, we can consider the valid letters from at each decision point to be the intersection of all possible letters (ie, the unvisited letters reachable from this point) and the valid next steps according to the preloaded dictionary. By this technique we waste as little time as possible trying to make valid words out of invalid prefixes.

Display

The tedious part of most problems is displaying the answer, but it doesn’t have to be. There’s little point on having beautiful interfaces on the inside but ugly interfaces to the world. It’s worth thinking about useful aspects of the Boggle output that we could gather. What distributions of letters produce the most words? Which ones produce the longest words? Finding the words is only half the fun.

I’ll think about that in a bit of detail next time.

One response so far

One Response to “Solving the problem of Boggle”

  1. Nickon 17 Mar 2008 at 11:44 pm

    Interestingly, I’d say that making the user interface is one of the more fun bits. I’m already planning the animation in my head…

    I don’t think the computer makes a valid opponent, however… it’s a bit of an exercise in Artificial Stupidity, and therefore not likely to be trusted by players since the scores aren’t really comparable. A computerised tutor, however…