Archive for the 'Maths & Computer Science' Category

Aug 23 2010

Update on SICP: two chapters done

Earlier today our little SICP study group agreed to start on Chapter 3 for our next meeting. There are five chapters in total so we are past the halfway mark and it’s been pretty interesting so far. To recap the recent stuff:

  • Representing data structures as functions, and data as functions. Specifically, calculating with Church numerals and simulating cons, car and cdr without built-in operations.
  • Manipulation of code as data, including symbolic differentiation of simple algebraic formulae. Converting “3x²” into “6x” with relevant simplification and grouping of symbols.
  • Some straight-up data structure work — ordered lists, trees, sets — and using them to implement Huffman encoding.
  • Organising multiple interacting systems, in this case a generic arithmetic package which works (almost) seamlessly on integers, rationals, reals, complex numbers and finally polynomial terms. Using this as a way of discussing functional and object-oriented design and introducing the expression problem.

I can definitely say that this last section, building a multi-module (for some definition of module) system made me really miss static types. Not being able to tell where functions were defined and called with different arities, or were using different symbols, until the problem manifested at runtime, was infuriating. The book is very interesting but the language is not to my taste. :-)

For comparison, here’s Yaron Minsky talking about using Ocaml for large scale programming:

Effective ML from Yaron Minsky on Vimeo.

No responses yet

Aug 06 2010

Why my unorthodox solution to the Cyber Security Challenge worked

In case you didn’t see his comment, I just wanted to promote Andrew’s answer to my question to the front page. I had wondered why my dividing-by-two techniques had worked to find the answer to the second part of the Cyber Security Challenge. The explanation is quite obvious when written in full.

The input sequence was a series of 8 bits which had been permuted in some way. According to the published solution they had been rearranged so that the right-most bits had been looped around and pushed onto the left side:

6 7 8 1 2 3 4 5

The recommended solution was of course to take those three and move them back round to the right hand side:

1 2 3 4 5 6 7 8

My accidental solution relied on two features of the problem. First, as I said previously, the hexadecimal encoding could be reordered before decoding. Since each hex number represents 4 bits, we’re going to reorder around the middle:

6 7 8 1 : 2 3 4 5

If we swap these 4-bit sequences first, we get something which is very close to the desired sequence:

2 3 4 5 : 6 7 8 1

The next step I performed was division-by-two. In binary this is represented by shifting all the numbers one place to the right. (Think of dividing by ten normally — you just shift the decimal point one place to the left.) When we do this right-shift the right-most digit disappears, and a left-most digit is created which is always zero in our case. I’ll mark this place with a # symbol.

# 2 3 4 5 6 7 8

We’ve almost got a full understanding of the solution here, but one problem remains. My method and the official method are nearly identical, except for the left-most bit:

1 2 3 4 5 6 7 8 <-- official solution
# 2 3 4 5 6 7 8 <-- my solution

We don’t know what any of the bits 1–8 represents, but we know that # is always a zero. So my solution “works” when bit number 1 is zero. And it just so happens that ASCII, the standard way of encoding simple characters in binary, only uses the 7 bits on the right. The left bit, number 1, is always 0 for simple text encoding.

If the message had been encoded in some extended ASCII format that uses all 8 bits my method would have failed. For the general case, bit 1 is only zero half the time!

No responses yet

Aug 02 2010

Part 2 of Cyber Security Challenge: Hexadecimal gibberish

This second part of the problem was more difficult, though the solution is much shorter to present. The input message looks like this, in part:

68edcdec4e2c8eae8d2c8e2dedcd6e04d2042fedae52ceac
04ccedaecd8c042ccd8c046cedad0e8dac8eac8c048e0dac

Under the assumption that each pair of characters represents a byte, we split the stream into pairs.

splitStream = splitEvery 2

Of course each pair isn’t actually a hexadecimal number, though it looks like one. It’s a string of characters, so we need to turn it into something the computer will recognise as a number.

Now, there’s a robust, I’m-a-serious-engineer way of doing this and then there’s the way I chose, which is to prefix all the strings with “0x” and use the read routine. The Read typeclass is not meant to represent a robust parsing mechanism but since this is a one-shot thing I think we’ll both just ignore it, yeah?

hexToInt :: String -> Int
hexToInt = read . ("0x" ++) -- ouch!

Having read in our number I tried to do what I did in the previous exercise and print it out, which produced complete nonsense. Then I decided that reversing the string before converting it to numbers would be helpful. I did that, but the result was still nonsense, just of a different shade.

Then I decided that my resulting numbers were a bit large. The number which represent capital A is 65, and I was getting numbers in the 200 range. Then I realised, “all these numbers are even, aren’t they?” So I divided the whole lot by two.

Outputting that message gave me a backwards English message. Aha! It’s close but there’s something wrong. I threw in another “reverse” statement and got an answer. At this point I noticed I had two reverse statements — one at the beginning and one at the end. This seemed like it would be the cause of my problems so I removed them both and, Robert’s your father’s brother, I was back to gibberish again.

What happened? What I had failed to realise was that when I had reverse my original message I had done so before splitting into pairs of characters. So ABCD becomes DCBA. After splitting this is [DC,BA]. If each pair is converted into a printable character, represented by f(XY), then the result is [f(DC),f(BA)]. I then reversed the message to get [f(BA),f(DC)].

If I didn’t do any reversing but went through the same splitting and conversion process, I got [f(AB),f(CD)]. Look — f(AB) is not the same as f(BA)! By reversing the list before splitting and again after conversion, I was implicitly reversing the order of the characters in each pair.

Obviously I had to reverse only the pairs after they’d been split, rather than the whole list at the beginning and end. If I do this then print the result I get an answer which is close to right.

byteToChar = chr . flip div 2 . hexToInt . reverse
message = map byteToChar . splitStream
 
loadfile = (head . lines) `fmap` readFile "hexstring.txt"
main = loadfile >>= putStrLn . message

The resulting message is:

Congratulations youve found and completed the REAL challenge. Your win code is cyb3r=s3cur1ty*ch@ll3nge+26-07-2010.

Please email this code to our team at media@cybersecuritychallenge.org.uk. If youre the first person to do so, and can prove you meet the eligibility criteria (British citizen currently resident in the UK) we will be in touch to advise how to claim your prize. Well done and good luck in the Cyber Security Challenge competitions taking place throughout the rest of the year.

There are a few infelicities which are explained by the official solution. My divide-by-two technique is not the proper approach so I think some of the characters get lost in the decryption process. But as the saying goes, close enough for government work!

The solution states:

The challenge was based on a bitshift operation applied to a string, here each byte’s “3 least significant bits” have been added to the left side of the byte (making them the most significant bits respectively)

Assuming we have bits one to eight:

1 2 3 4 5 6 7 8

what I did was divide everything by two, which we do in binary by shifting everything to one side so the least significant bit disappears. We fill the opposite edge with a zero here:

0 1 2 3 4 5 6 7

What the solution demands is to rotate the bits like so:

6 7 8 1 2 3 4 5

I am still not sure how it is that my solution matches so neatly with their own. Answers on a postcard.

If I hadn’t stumbled on a solution like this, the sensible approach might have been to produce a histogram of common characters. In this case, the most commonly used character encoded the space, followed closely by “e”, the most common letter in standard English texts. From there it would have been a bit of a slog to produce the result but it could be done.

The two other solutions I have found both use the official method without problems. I suspect it’s got something to do with default word lengths (8 bit versus 32 bit) and signed integers, though I’ve not thought deeply on the issue.

3 responses so far

Jul 31 2010

Part 1 of Cyber Security Challenge: Decoding the image

The first part of the Cyber Security Challenge requires you to decode a long stream of letters, numbers and symbols into something coherent. This was by far the easiest part of the challenge. The symbols look like this:

/9j/4AAQSkZJRgABAQEAYABgAAD/4QBaRXhpZgAATU0AKgAAAAgABQMBAAUAAAABAAAASgMDAAEA
AAABAAAAAFEQAAEAAAABAQAAAFERAAQAAAABAAAOxFESAAQAAAABAAAOxAAAAAAAAYagAACxj//b
AEMACAYGBwYFCAcHBwkJCAoMFA0MCwsMGRITDxQdGh8eHRocHCAkLicgIiwjHBwoNyksMDE0NDQf

This is only the first three lines — there’s 362 lines of this. The most important bit is that the last line ends in a = sign. This symbol is used for padding out messages when they are encoded in Base 64 format, so if any string has an equals symbols on the end it’s worth seeing what happens if you decode it as if it were Base64. (The truth of the matter is that by the time my mate Rich had told me about this challenge he’d already decoded the image so I didn’t do any of this stuff!)

Thankfully there are plenty of programs that will do the decoding for you, and the resulting file is a JPEG file. Brilliant:

This is XKCD comic number 538, with a subtle difference. Round the outside of the image there is an uneven dotted border. This isn’t in the original comic, and it looks irregular which suggests that it’s not just a pretty pattern but that it encodes some further information.

For the next stage I converted the image to PNG format because it was the easiest format to load and process.

The first part we need to do is load the image file and extract all the bytes from the actual image part, ignoring any metadata. The PNG file is represents a 24 bit colour image, with each component (red, green and blue) stored as an 8 bit number from 0–255. We just return the whole thing as a long list of bytes, starting at the top left and scanning left to right, top to bottom. It’s not efficient but it’s very simple to reason about because we don’t need to think about array locations.

getimage :: IO [Word8]
getimage = do
  (Right img) <- loadPNGFile "decode.png"
  getElems (imageData img)

The long list may be conceptually simple but it’s not representative of the structure of the image. Since each pixel is represented by 3 consecutive numbers we want to gather those numbers together, so we can start dealing with them as pixels. Thus we convert [r,g,b,r,g,b,r,g …] into [[r,g,b],[r,g,b],…]

splitIntoBytes = splitEvery 3

Even though each pixel can represent many colours we’re really only dealing with black and white here so we can represent each component as fully-on or fully-off. Each pixel can really be stored as boolean values like [True, False, False]. Since it’s possible that some values are not absolutely 0 or absolutely 255 I’ve divided them up the middle — anything darker than 128 is 0, and anything lighter is white.

Once each pixel is a list of booleans we can collapse that down into a single black/white value by taking the conjunction. If all values are True then the pixel is True (ie, black) otherwise False (white).

normalise = and . map (< 128)

At this stage our image is no longer a list of integers but a list of booleans. The original list had height-times-width-times-3 elements. Since we’ve collapsed all those three elements into a single value representing each pixel we now have height-times-width elements. The image is 350x175 pixels so we can split it into rows by cutting the list every 350 elements. This gives us 175 rows.

tomatrix = splitEvery 350 . map normalise . splitIntoBytes

The next stage is, I think, the most beautiful aspect of representing the image as a list of lists. We have a full image but we only want the wavy lines of pixels which make up the border. How do we extract those pixels?

The first thing to notice is that the border is not continuous. The top and bottom edges go to the edge of the screen at both sides, but the two sides don’t meet at the top or bottom. To illustrate:

#################
 
#               #
#               #
#               #
 
#################

So the first question, “where do we start and end?” is answered for us. Each segment is discrete. It was my colleague Rich which pointed out the slight gaps in the border which makes this simplification possible. If the border were complete we’d have to determine whether the corner pieces were part of both the horizontal and the vertical (like a crossword where Across and Down share letters) or not.

The second question we ask is, “which direction does the sequence go in?”. Do we go clockwise from top left, following this alphabetical sequence?

a b c d e
n       f
m       g
l k j i h

Or do we maybe scan left-to-right, ignoring gaps?

a b c d e
f       g
h       i
j k l m n

What about anticlockwise? A mixture of the above? I took the approach which seemed obvious to me, clockwise from top left, and it turned out to be correct. But interestingly the final message is repeated in part so choosing the wrong start point would not have mattered, and it would have been obvious that the output was almost right.

Each row is a list, stored in order from top to bottom. This means the top border is just the first list:

topedge = head

The bottom border is the last list, but because we’ve chosen clockwise we need to reverse the list too.

bottomedge = reverse . last

The right edge is the last element of each list, ignoring the first 3 rows and the last three rows, because as mentioned above, the side patterns are shorter. These two look slightly complicated but it’s mostly dealing with trimming the top and bottom edges off.

rightedge = reverse . drop 3  . reverse . map last . drop 3
leftedge = drop 3 . reverse . map head . drop 3

Now we can extract each edge of the border we can join them all together into our encoded message. This converts our list of lists, which is a complete picture, into a list of bits encoded in the border.

msgstream :: [[Bool]] -> [Bool]
msgstream bits = concatMap ($ bits) [topedge, rightedge, bottomedge, leftedge]

Now we have a long list of bits. And it just so happens the number of bits we have is divisible by eight! This is a good sign because binary information is typically grouped into sets of eight. For the next stage we group our list into sets of 8 bits and turn each 8 into a single number.

One of the tricky aspects about any number is that you can’t tell in advance which end to start from. You and I know that 12 is “twelve” not “twenty-one” but that’s because we know to read numbers from left to right. Computer formats have used both in the past so I wasn’t sure which one would be important to me — is the biggest number the first digit or the last digit? The actual question is, “is the leading digit the most significant bit or the least significant bit?”. I calculated both to see what would happen, and it turned out that most-significant-bit was the way to go. Thankfully, MSB and LSB are just the reverse of each other, so by implementing one we get the other for free!

msb,lsb :: [Bool] -> Int
msb = lsb . reverse
lsb = bitsToInt . map (fromIntegral . fromEnum)

We calculate least significant bit by converting all the True values to 1 and all the False values to zero, and then multiplying element-wise by a stream of powers of two. To illustrate:

  1 2 4 8 16
* 0 1 1 0 1
= 0 2 4 0 16

Then we just add that list up, so “0 1 1 0 1” is “0+2+4+0+16” or 22.

bitsToInt = sum . zipWith (*) (iterate (2*) 1)

After all that prelude we can put this segment together, dividing the image up into its matrix, extracting the bits from the border of the image, splicing them up into sets of eight and converting each 8 bits into a single number:

msgbits = map msb . splitEvery 8 . msgstream . tomatrix

Now what do we do with our numbers? Readable characters are represented internally as numbers, so it is a simple thing to convert between number and printable characters. If we convert each number to a character we get nonsense, but consistent and tantalising nonsense:

Cyrnfr sbyybj guvf yvax: uggcf://plorefrphevglpunyyratr.bet.hx/834wgc.ugzy uggcf://plorefrphevglpunyyratr.bet.hx/834wgc.ugzy

Look at those sequences “uggcf://” repeated twice. That looks so much like a web address, “https://”. If we assume that it is a web address how has it been altered? Each letter in a pair, h/u, t/g, c/p, is 13 characters apart from its partner. But the punctuation symbols aren’t any different.

This looks like the encoding called “rot13” where each character is shifted 13 characters along in the alphabet, and if we reach the end we wrap back to the start. Also, since the first letter of the nonsense message is a capital and none of the rest are it seems like case is being preserved by this encoding.

We convert back to sensible words by rot13 encoding again, since 13+13=26 so any two applications of this transformation will undo each other.

Uppercase and lowercase are treated separately but by the same process. We’re processing some character which we call c and we want to know how many characters it is away from the start of the alphabet. The letter ‘a’ (or ‘A’) is 0 characters away from the start, ‘b’ is 1 character and so on.

If we assume an infinite stream of letters “a b c … x y z a b c…” repeating the alphabet, then the Nth letter in that stream is the one which is N away from the start. The 0th letter is ‘a’, the first letter ‘b’. But if we chop the first 13 characters off this stream, so it’s “n o p … y z a b c …” then the 0th letter is ‘n’, the 1st letter ‘o’ and so on. Each offset now directly maps one letter onto its partner letter. And because the sequences of letters is endless we don’t have to worry about falling off the end. The mapping just loops back on itself:

0 1 2 3 4 5 6 ...
a b c d e f g ...
n o p q r s t ...

Thus we can find the alternate character in our pair by checking first of all how many characters our current letter is from the start, and then looking for the equivalent character in the list which has had its head chopped off. We do the same for the upper case characters and just pass through unchanged anything which isn’t upper or lower case — all the punctuation and spaces.

rot13 :: Char -> Char
rot13 c | isLower c = lowercase!!(ord c - ord 'a')
        | isUpper c = uppercase!!(ord c - ord 'A')
        | otherwise = c
  where lowercase = drop 13 $ cycle ['a'..'z']
        uppercase = map toUpper lowercase

By now I think you’re dying to know what the message says, so let’s finish up here. We decipher a message by converting each integer to a character and performing a rot13 transformation — then we print it.

decipher = putStrLn . map (rot13 . chr)
main = getimage >>= decipher . msgbits

The whole thing is pulled together so we can run it from the command line and we receive the sensible output of:

Please follow this link:
https://cybersecuritychallenge.org.uk/834jtp.html https://cybersecuritychallenge.org.uk/834jtp.html

Next time I’ll look at what happens when we follow that link, and how we complete the next phase of the challenge.

If any of this doesn’t make sense or is confusing please ask questions in the comments. If you want to look at the code in full you can read the solutions for part 1 and part 2 online at http://www.dougalstanton.net/code/cybersecurity/.

As I mentioned in my introductory post I was incredibly impressed by the easy exploratory power of the Haskell code I wrote. Writing little segments to splice, decode, convert and transform made it simple to try out different ways of getting sensible output. I had no idea when I started which way the answer would take me, but putting them together in different combinations in the interpreter was easy and provided instant feedback. (That being said, Rich kinda floored me with his ability to wrangle Excel of all things into solving this problem, though that doesn’t mean it’s the right tool for the job!) Tune in next time for more exciting cryptographic games!

One response so far

Jun 28 2010

How filtered-accumulate generalises accumulate

Published by Dougal under Maths & Computer Science

I posted a couple of days ago about one of the earlier exercises (ex1.33) in SICP. After some discussion with Brent Yorgey on Facebook I had another look at some of the definitions given in the book. Yes, if in doubt, re-read the question! :-)

The issue that needed clarified was whether I could justify treating the null-value argument as a full-blown identity element. The text says:

a null-value that specifies what base value to use when the terms run out.

(It’s worth noting that the definition of combiner also isn’t specified very well: a “procedure (of two arguments) that specifies how the current term is to be combined with the accumulation of the preceding terms”.)

This isn’t abundantly clear but it suggests that null-value is the last element that can be included in the calculation, so that:

5 + 4 + 3 + 2 + 1 + 0

is allowed but

0 + 1 + 2 + 3 + 4 + 5

is not allowed, because I’m starting with my null-value (ie, 0) and adding the remaining values to it. In this case there is no difference in result. So I needed to find a situation where some element work as an identity on the right but not on the left. Wikipedia to the rescue!

The number 1 as an identity element for exponents if it’s the power but not if it’s the base. Using ^ as an infix exponent function, for any x:

x ^ 1 = x

but

1 ^ x = 1

We can create a chain of odd-numbered exponents to test our implementations of filter-accumulate:

(define (odd-exponent-chain a b)
  (filtered-accumulate odd? expt 1 identity a inc b))

An implementation of filtered-accumulate which cleaves as closely as possible to the definitions should implement (odd-exponent-chain 2 6) as (expt 3 (expt 5 1)).

This conclusion that null-value only works as a right-identity has further repercussions — most of the answers published for iterative version of accumulate are wrong. Why?

(define (accumulate combiner null-value term a next b) 
  (define (go acc n) 
    (if (> n b)
        acc
        (go (combiner acc (term a)) (next n)))) 
  (go null-value a))

The iterative versions always start with the null-value in the accumulator and slowly combine each new term until the end. But the definition above clearly states that null-value is to be combined with the results when there are no more terms left.

This is possible but not as elegant as the recursive solution by a long shot. There’s probably a good way of cleaning it up that I haven’t spotted yet.

(define (accumulate-iter combiner null-value term a next b)
  (define (go acc n)
    (if (n > b)
        (combiner acc null-value)
        (go (combiner acc (term n)) (next n))))
  (if (a > b)
      null-value
      (go (term a) (next a))))

Having got this far we realise we can no longer implement filtered-accumulate in terms of accumulate: the first term forms the seed for the accumulator whether it is appropriate or not. In this case we do have to create a new filtered-accumulate which is more powerful than accumulate on its own.

(define (filtered-accumulate-iter filter combiner null-value term a next b)
  (define (go acc n)
    (if (n > b)
        (combiner acc null-value)
        (go (combiner acc (term n)) (next n))))
  (define (start n)
    (cond ((n > b) null-value)
          ((filter n) (go (term n) (next n)))
          (else (start (next n)))))
  (start a))

After all this it seems prudent to ask “who’s being pedantic here?” and whether the mistakes identified are the right ones.

  • The lack of formal specification of parameters like null-value encourages assumptions. Most people doing these exercises assumed that null-value works as an identity element per the examples of (+,0) and (*,1). These examples reinforced rather than highlighted the assumptions.
  • With this assumption it is possible to implement filtered-accumulate in terms of accumulate, contra the question. I haven’t found an instance online where someone spotted this mismatch.
  • The “obvious” ways of writing the iterative and recursive accumulate have different semantics if you don’t pay close attention. (Which is another way of saying one of them is wrong.) Hence the problem with making poor assumptions.
  • The only way that requires a rewrite of filtered-accumulate is one that doesn’t assume a left-identity element.
  • There is an official tutor’s book published but not available online (presumably to prohibit cheating for universities who use this textbook). So we don’t know what the writers had in mind when they wrote that section.

It’s been an interesting few days of thinking! Many thanks to Brent for making me re-examine the text and my assumptions a further time. I think we got to the bottom of it in the end.

2 responses so far

Jun 26 2010

Does filter-accumulate generalise accumulate?

Published by Dougal under Maths & Computer Science

My latest interesting thought while doing the exercises in SICP is a bit confusing because the book is suggesting something which seems wrong. In Exercise 1.33, after creating a function accumulate (a kind of fold without lists where the start and end values are specified) it asks:

You can obtain an even more general version of accumulate by introducing the notion of a filter on the terms to be combined. That is, combine only those terms derived from values in the range that satisfy a specified condition. The resulting filtered-accumulate abstraction takes the same arguments as accumulate, together with an additional predicate of one argument that specifies the filter. Write filtered-accumulate as a procedure.

The tricky bit I see is the “even more general version” instruction. As far as I can tell filtered-accumulate can be written in terms of accumulate. It’s not more general or more powerful than the plain accumulate.

The previously-defined accumulate takes the following arguments:

  • a null-value and a combiner-function (which is to say the binary operator and identity for some monoid)
  • a pair of numbers representing the start and end value in the range we want to accumulate over
  • a function which maps values in this range to values in the monoid described above (and it could be the identity)
  • a function to get to the next value in the range given the current one

My (recursive) accumulate looks like this:

(define (accumulate combiner null-value term a next b)
  (if (> a b)
    null-value
    (combiner (term a) (accumulate combiner null-value term (next a) next b))))

Exercise 1.33 wants you to pre-filter values so that they don’t all get added together. Only those a which pass some predicate get combined into the result. But since the combining function comes specified with a null value it’s not necessary to skip over undesired results. All we need to do is convert any values which fail the test into the null value.

Put it this way: given a list of numbers between one and ten, you could find the sum of primes by either removing all the non-primes, or converting them into zeroes, because adding zero doesn’t affect the result. Similarly if you want the product of odd numbers between 14 and 103 you can filter out all the even numbers or turn them into 1s, because multiplying by 1 doesn’t affect the end result.

And it just so happens that there is already a way in accumulate to convert numbers before they are combined together, the term argument. This is used to square input values or apply some other function. The key argument is that this function can check if the input meets some criterion and then either process it further or replace it with null-value.

(define (filtered-accumulate filter combiner null-value term a next b)
  (define (filterterm n) (if (filter n) (term n) null-value))
  (accumulate combiner null-value filterterm a next b))

All the solutions to this exercise that I have seen online basically re-implement the rest of accumulate with extra filtering. This doesn’t seem to be necessary. My implementation above works for the following two exercises (sum of squares of primes, and product of all relatively prime integers).

But the exercise plainly says that filter-accumulate is “an even more general version of accumulate”, which to my mind means that filter-accumulate should be able to produce answers which standard accumulate cannot. (And that filter-accumulate cannot be implemented in terms of accumulate.) What is the counter-example that I’m missing, or is the book wrong?

Note: I’ve starting putting my solutions for the SICP exercises online, at http://www.dougalstanton.net/code/sicp/.

One response so far

Jun 19 2010

Initial thoughts on SICP study group

A few weeks ago I mentioned the idea of a study group at work, for the purpose of reading through some programming-relevant texts and discussing them. We’ve been working on Structure and Interpretation of Computer Programs (SICP) for a few weeks now. I’m enjoying it more than I expected. The exercises are mostly within grasp and I’m happy to note that I’m not being left behind intellectually. There are a few mathematical proof exercises I’ve been skipping and some of the more boring “run this computation by hand” are not worth taking to completion once you’ve seen the basic point. But mostly we’re following the exercises as planned.

Scheme is not so interesting so far. The parentheses still bug me and we haven’t got to the stage where dynamic typing has proven useful so I’m smarting from the lack of a decent type system. From what I remember when I skimmed the book in the past most of Scheme development involves creating ad hoc type systems in the runtime. :-) Many people insist that dynamic languages are good for something so hopefully this something will become apparent.

I have enjoyed several of the exercises which require rewriting recursive algorithms in an iterative style. Seeing the contrast and thinking up different routes to the end product is fun, and I’m glad we’ve got a unit test framework (SchemeUnit) to help in that regard. I tried to track down a Scheme version of QuickCheck but while one used to exist it’s disappeared off the net.

No responses yet

Oct 28 2009

What is this simulated annealing stuff anyway?

A couple of days ago I mentioned implementing simulated annealing for my comicbake program so I thought I would take some time to explain what that means and why it’s helpful. I’m sure there are some curious souls out there wondering what it’s all about!

He is seeking it, seeking it, all his thought is bent on it.

Let us start with a discussion of search.

Search is what you do when you don’t know how to get something by straightforward means. The type of search you do depends on what you’re looking for. Sometimes you know the goal but want to know how to get there. How to get from A to B is literally the problem solved by route-finding programs like Google Maps. The other type of search is where you don’t know what the goal is but you know what properties it has. You don’t know what your next chess move might be until you examine the board, but ideally it would involve not losing any pieces, and maybe gaining one of the opponent’s pieces.

Some searches are completely “uninformed” and will examine all possible solutions. This is obviously quite slow, so is avoided in all but the smallest of cases.

A simple improvement is to assume that any small change that improves the current solution can be used as a starting point to hunt for another small change. This is called “hill-climbing”, as we slowly ascend the terrain of the search space towards the highest point. An uninformed search would walk back and forwards across the landscape in a regular pattern as if looking for a contact lens. Hill climbing would never go back down hill, but follow the steepest course up hill.

The Gloaming from Beinn Bheoil.
The Gloaming from Beinn Bheoil.
© Iain Gillespie

The trouble with hill climbing, as any ramblers out there will know, is the minor summits. You get to the top of a small hill which seemed so all-encompassing, and you realise it actually stands in the shadow of a more massive peak. And to get onto that peak, you have to go downhill. Standard hill-climbing search doesn’t know there are other peaks and won’t ever go downhill to find them. It gets stuck in the foothills.

Simulated annealing is one further improvement to standard hill climbing search. It allows us to go down instead of up sometimes, if the direction we have chosen takes us down. We won’t always go downhill, as we want the gradual trend to be upwards, but a small proportion of downhill treks will help us avoid being trapped on the little peaks. To continue with the hill-climbing theme we can decide that the chances of going downhill as well as up should depend on how energetic we feel. Early in the day we may be willing to strike out in search of new peaks, but as the day wears on and the legs begin to tire, the adventurous spirit wanes.

Search with simulated annealing helps to avoid being trapped in locally good areas. The name in fact comes from the process of heating worked metal and cooling it very slowly — the molecules in this case are affected by temperature, allowing them to break out of their warped crystalline structure and slowly form better bonds as the temperature drops.

There were minstrels and mountebanks and harpers and clowns

What does all this have to do with where we started? With comic strips? You can probably see, in some vague way, how the search process might work if we actually wanted to find the tallest point in a range of hills, but how does this translate to the actual problem at hand?

In case there is any confusion, the problem at hand is choosing the best location to place speech bubbles on an individual panel of a comic strip, so that none of the bubbles overlap each other or hide the characters’ faces. Rather than jumping straight in to the solution we’ll work up to it slowly.

It helps to consider the problem in terms of the degrees of freedom we have at any point. How many things can we change at any step of the problem, in order to make the situation better or worse (ie, to ascend or descend our metaphorical hill)? If we were doing an actual hillwalk we could define our location in terms of x and y co-ordinates (such as latitude and longitude), and for every co-ordinate there would be a unique, corresponding height. Think of it as an association from co-ordinate to height:

(x,y) → Height

Here we have two degrees of freedom, as we can change each of the x and y co-ordinates independently to change our height.

In some situations you have more degrees of freedom. If you’re walking around your house looking for mobile phone signal you might also wave the phone in the air, or stand on the kitchen table or anything else if you think it will work. In this case you’re changing x, y and z co-ordinates. The value which changes is signal strength, and you’re trying to find the combination of co-ordinates which maximises signal.

(x,y,z) → Signal

This situation has three variable aspects, but other situations will have more. Every time we increase the number of variables the problem gets harder. Next you might imagine having to solve this problem to get several compatible solutions. If two people want to use their mobile phone at the same time they both can’t stand in the same spot. Now we need two places where the signal is strong! We can increase this to three, four, five…

But let’s pull things back a bit. If there are two mobile phone users in the same general area they both want good signal. We’ll say for the sake of explanation that we have three people waiting for really important calls:

Phone1(x,y,z) → Signal1
Phone2(x,y,z) → Signal2
Phone3(x,y,z) → Signal3

We want to maximise the sum of all the signals, so that everybody gets a good quality connection.

Maximise (Signal1 + Signal2 + Signal3)

To maximise the signal across all the phones we still only have the options of changing x, y or z for each. If we number each co-ordinate of each phone — so that Phone1 is located at point (x1,y1,z1) — we can see that we really need to maximise the following function:

(x1,y1,z1,x2,y2,z2,x3,y3,z3) → Overall Signal

That’s a lot of choices! And the more things there are to change, the harder the problem is and the more likely that we’ll waste time changing things which don’t affect the overall problem.

(The astute might have noticed that if everyone starts off with okay signal, we might be able to give someone really great signal at the cost of bumping someone else into a signal black hole. Overall the sums might not change, or they may prefer the unfair arrangement to the fair-but-mediocre one. We can avoid this by careful planning.)

Johnny Boo, Twinkle Power
Johnny Boo, Twinkle Power
© John Cooper

Now back to the problem again. We’ve got a selection of bubbles that all have to occupy unique places on the comic panel. There are a few places which they definitely can’t occupy (the space occupied by characters’ faces) and they also shouldn’t be out of order otherwise the comic won’t make much sense.

This is more or less the same as the mobile phone example without the ability to change z co-ordinate. For simplicity we actually invert the sense of the search — what we’re trying to do is find sea-level rather than the highest peak, though the principle is identical. To do that we invent a set of guidelines, and breaking each guidelines induces a penalty, or “cost”. The search process is trying to find an arrangement which reduces the cost to zero by not breaking any guidelines.

If we’re finding the ideal arrangement of three speech bubbles then this is the function we have to minimise. Notice the similarity to all the others before:

(x1,y1,x2,y2,x3,y3) → Cost

To calculate the cost we add up all the individual penalties which would each contribute to a “bad” solution:

  • Overlapping another bubble induces a penalty. There’s an additional penalty for each bubble that overlaps another, which reduces the chances of them all piling up at one spot on the image.
  • Overlapping a character’s face induces a penalty.
  • Being out of order induces a penalty proportional to how far out of order a bubble is. If the first bubble appears sixth that’s really bad — it’s at totally the wrong side of the image — so we need to ensure that kind of thing doesn’t happen.

With these simple rules we run the computer through the simple procedure of moving the bubbles randomly, assessing their cost and then repeating. They slowly settle on their chosen state and the search process ends.

Along the way we’ve been keeping note of best overall position of each speech bubble. If it’s the current position we leave them there, but if they have been rearranged since then we revert to the overall best formation.

This search procedure is widely used for a lot of difficult problems — problems which would take too long to solve exhaustively. It’s also surprisingly simple to implement the core. The hard part is coming up with suitable cost measures which are quick but effective.

4 responses so far

Oct 18 2009

Bouncing over hills and into valleys, searching for a solution

It’s been a busy few days on the code front. After coming to no real conclusions about the best way to lay speech bubbles out on a page deterministically, I’ve decided to explore the issue of search. I’ve knocked up a quick implementation of simulated annealing, which I’m now weaving in to rest of the code.

Like all random search methods, simulated annealing lives or dies on the quality of the heuristics you can provide. Since you don’t generally know the answer when you’re solving the problem you can’t test any intermediate solutions in the obvious manner. You instead have to test properties which you expect the solution to have. In this case, I have to test that, for example, the speech bubbles are laid out sequentially on the page. Any pair of speech bubbles which are out of order will ruin the sense of the comic, so this is fairly important.

There are other measures of goodness (or badness) of a solution, and some may be more effective than others. This has to be offset against the cost of computing the worth of each solution. If it takes 5 minutes to generate one scene it’s no use to me. Most people would rather place bubbles manually than wait this long. So speed is similarly important.

Right now I’m just waiting. I want to use the Mersenne Twister PRNG from Hackage, and it’s just my luck that the Hackage server appears to be down at the moment:

$ cabal install mersenne-random
Resolving dependencies...
Downloading mersenne-random-1.0...
cabal: Error: some packages failed to install:
mersenne-random-1.0 failed while downloading the package.

I’m optimistic about the results from this approach. I’ve been doing some reading on graph visualisation and also map labelling, which both have a lot in common with my problem. Simulated annealing appears to be effective in both instances, so I look forward to seeing what it produces in the end. When I get my shipment of random numbers…

No responses yet

Aug 31 2009

Edinburgh Haskell Hackathon went quite well

The International Conference on Functional Programming (ICFP) is taking place this week in Edinburgh. A substantial group of attendees are Haskell programmers and researchers, so it seemed a good idea to organise a Hackathon to work on libraries, infrastructure tools and various bits that everybody can use.

Eric Kow was the driving force behind this, and he managed to get us use of the conference facilities on Sunday, while there were a couple of workshops happening elsewhere in the building.

We didn’t have anywhere definite arranged for Saturday, so I went scouting a couple of weeks ago and identified a couple of places that would be suitable. I emailed and phoned Henderson’s and they weren’t horrified by the idea. We spent the day there, making sure to buy enough coffees to keep the management happy. There were 8 or 9 of us at the busiest point and we got some pretty amused/strange looks from other customers: a crowded table of people all with laptops open, ignoring each other…

P1010566
P1010566
© Brett Holman

In the evening we went out restaurant-hunting, which I don’t recommend in these circumstances — 9 people, Saturday night, Edinburgh festival, centre of town, no booking. I tried calling Calistoga (their number was still in my phone) as they’re both spacious and hidden from the tourists but they couldn’t take us either. We eventually found an Indian restaurant that would take us, so we idled in the pub across the road while they got ready for us.

On Sunday we were using the ICFP conference facilities, deep in the bowels of the Royal College of Physicians on Queen Street. It was a nice room, with little groups of tables, plenty of extension blocks for powering laptops, a flip chart and a projector. We even had wifi, though the networking infrastructure died a few times from heavy use. We were running off a separate system from the rest of the conference centre and their wireless access died, so there was an influx of random conference-goers looking for net access, which brought our wifi to its knees too. I had felt slightly guilty about taking people to a cafe with dodgy wifi the day before, but in fact the conference centre had an even less reliable connection! :-)

My sincere thanks to the staff of Henderson’s for putting up with us, to Graeme Hutton and Philip Wadler for getting us a room in the conference venue on Sunday, to Eric for doing all his organisation and to everybody who actually came along.

I’m putting a call out to everyone that attended to let me know what they did and so on. More updates when I hear back from them.

2 responses so far

Next »