Archive for October, 2009

Oct 28 2009

What is this simulated annealing stuff anyway?

A couple of days ago I mentioned implementing simulated annealing for my comicbake program so I thought I would take some time to explain what that means and why it’s helpful. I’m sure there are some curious souls out there wondering what it’s all about!

He is seeking it, seeking it, all his thought is bent on it.

Let us start with a discussion of search.

Search is what you do when you don’t know how to get something by straightforward means. The type of search you do depends on what you’re looking for. Sometimes you know the goal but want to know how to get there. How to get from A to B is literally the problem solved by route-finding programs like Google Maps. The other type of search is where you don’t know what the goal is but you know what properties it has. You don’t know what your next chess move might be until you examine the board, but ideally it would involve not losing any pieces, and maybe gaining one of the opponent’s pieces.

Some searches are completely “uninformed” and will examine all possible solutions. This is obviously quite slow, so is avoided in all but the smallest of cases.

A simple improvement is to assume that any small change that improves the current solution can be used as a starting point to hunt for another small change. This is called “hill-climbing”, as we slowly ascend the terrain of the search space towards the highest point. An uninformed search would walk back and forwards across the landscape in a regular pattern as if looking for a contact lens. Hill climbing would never go back down hill, but follow the steepest course up hill.

The Gloaming from Beinn Bheoil.
The Gloaming from Beinn Bheoil.
© Iain Gillespie

The trouble with hill climbing, as any ramblers out there will know, is the minor summits. You get to the top of a small hill which seemed so all-encompassing, and you realise it actually stands in the shadow of a more massive peak. And to get onto that peak, you have to go downhill. Standard hill-climbing search doesn’t know there are other peaks and won’t ever go downhill to find them. It gets stuck in the foothills.

Simulated annealing is one further improvement to standard hill climbing search. It allows us to go down instead of up sometimes, if the direction we have chosen takes us down. We won’t always go downhill, as we want the gradual trend to be upwards, but a small proportion of downhill treks will help us avoid being trapped on the little peaks. To continue with the hill-climbing theme we can decide that the chances of going downhill as well as up should depend on how energetic we feel. Early in the day we may be willing to strike out in search of new peaks, but as the day wears on and the legs begin to tire, the adventurous spirit wanes.

Search with simulated annealing helps to avoid being trapped in locally good areas. The name in fact comes from the process of heating worked metal and cooling it very slowly — the molecules in this case are affected by temperature, allowing them to break out of their warped crystalline structure and slowly form better bonds as the temperature drops.

There were minstrels and mountebanks and harpers and clowns

What does all this have to do with where we started? With comic strips? You can probably see, in some vague way, how the search process might work if we actually wanted to find the tallest point in a range of hills, but how does this translate to the actual problem at hand?

In case there is any confusion, the problem at hand is choosing the best location to place speech bubbles on an individual panel of a comic strip, so that none of the bubbles overlap each other or hide the characters’ faces. Rather than jumping straight in to the solution we’ll work up to it slowly.

It helps to consider the problem in terms of the degrees of freedom we have at any point. How many things can we change at any step of the problem, in order to make the situation better or worse (ie, to ascend or descend our metaphorical hill)? If we were doing an actual hillwalk we could define our location in terms of x and y co-ordinates (such as latitude and longitude), and for every co-ordinate there would be a unique, corresponding height. Think of it as an association from co-ordinate to height:

(x,y) → Height

Here we have two degrees of freedom, as we can change each of the x and y co-ordinates independently to change our height.

In some situations you have more degrees of freedom. If you’re walking around your house looking for mobile phone signal you might also wave the phone in the air, or stand on the kitchen table or anything else if you think it will work. In this case you’re changing x, y and z co-ordinates. The value which changes is signal strength, and you’re trying to find the combination of co-ordinates which maximises signal.

(x,y,z) → Signal

This situation has three variable aspects, but other situations will have more. Every time we increase the number of variables the problem gets harder. Next you might imagine having to solve this problem to get several compatible solutions. If two people want to use their mobile phone at the same time they both can’t stand in the same spot. Now we need two places where the signal is strong! We can increase this to three, four, five…

But let’s pull things back a bit. If there are two mobile phone users in the same general area they both want good signal. We’ll say for the sake of explanation that we have three people waiting for really important calls:

Phone1(x,y,z) → Signal1
Phone2(x,y,z) → Signal2
Phone3(x,y,z) → Signal3

We want to maximise the sum of all the signals, so that everybody gets a good quality connection.

Maximise (Signal1 + Signal2 + Signal3)

To maximise the signal across all the phones we still only have the options of changing x, y or z for each. If we number each co-ordinate of each phone — so that Phone1 is located at point (x1,y1,z1) — we can see that we really need to maximise the following function:

(x1,y1,z1,x2,y2,z2,x3,y3,z3) → Overall Signal

That’s a lot of choices! And the more things there are to change, the harder the problem is and the more likely that we’ll waste time changing things which don’t affect the overall problem.

(The astute might have noticed that if everyone starts off with okay signal, we might be able to give someone really great signal at the cost of bumping someone else into a signal black hole. Overall the sums might not change, or they may prefer the unfair arrangement to the fair-but-mediocre one. We can avoid this by careful planning.)

Johnny Boo, Twinkle Power
Johnny Boo, Twinkle Power
© John Cooper

Now back to the problem again. We’ve got a selection of bubbles that all have to occupy unique places on the comic panel. There are a few places which they definitely can’t occupy (the space occupied by characters’ faces) and they also shouldn’t be out of order otherwise the comic won’t make much sense.

This is more or less the same as the mobile phone example without the ability to change z co-ordinate. For simplicity we actually invert the sense of the search — what we’re trying to do is find sea-level rather than the highest peak, though the principle is identical. To do that we invent a set of guidelines, and breaking each guidelines induces a penalty, or “cost”. The search process is trying to find an arrangement which reduces the cost to zero by not breaking any guidelines.

If we’re finding the ideal arrangement of three speech bubbles then this is the function we have to minimise. Notice the similarity to all the others before:

(x1,y1,x2,y2,x3,y3) → Cost

To calculate the cost we add up all the individual penalties which would each contribute to a “bad” solution:

  • Overlapping another bubble induces a penalty. There’s an additional penalty for each bubble that overlaps another, which reduces the chances of them all piling up at one spot on the image.
  • Overlapping a character’s face induces a penalty.
  • Being out of order induces a penalty proportional to how far out of order a bubble is. If the first bubble appears sixth that’s really bad — it’s at totally the wrong side of the image — so we need to ensure that kind of thing doesn’t happen.

With these simple rules we run the computer through the simple procedure of moving the bubbles randomly, assessing their cost and then repeating. They slowly settle on their chosen state and the search process ends.

Along the way we’ve been keeping note of best overall position of each speech bubble. If it’s the current position we leave them there, but if they have been rearranged since then we revert to the overall best formation.

This search procedure is widely used for a lot of difficult problems — problems which would take too long to solve exhaustively. It’s also surprisingly simple to implement the core. The hard part is coming up with suitable cost measures which are quick but effective.

4 responses so far

Oct 19 2009

Offences against humour

Published by Dougal under Friends, Humour

A conversation over IM earlier today. I’m the guilty D in this exchange.

M: Then all you need is a sofa on wheels, and you can really travel in style.

D: I know where I can get one of them! Ssofa so good you might say, but how do I couch my idea in more marketable terms? I don’t want to lounge around all day, I want to chaise the big money!

M: Those were terrible.

D: You can’t deny the joy of punning — have a go, you ottoman.

M: You are a bad man.

D: It was all going so well, and then I just went and put my futon it.

M: I’m going to report you to the ‘Law and Order: Misuse of Puns’ division

D: Will they lock me up and take me away in divan?

Five minutes pass.

D: Oh no, I’ve killed him.

2 responses so far

Oct 18 2009

Bouncing over hills and into valleys, searching for a solution

It’s been a busy few days on the code front. After coming to no real conclusions about the best way to lay speech bubbles out on a page deterministically, I’ve decided to explore the issue of search. I’ve knocked up a quick implementation of simulated annealing, which I’m now weaving in to rest of the code.

Like all random search methods, simulated annealing lives or dies on the quality of the heuristics you can provide. Since you don’t generally know the answer when you’re solving the problem you can’t test any intermediate solutions in the obvious manner. You instead have to test properties which you expect the solution to have. In this case, I have to test that, for example, the speech bubbles are laid out sequentially on the page. Any pair of speech bubbles which are out of order will ruin the sense of the comic, so this is fairly important.

There are other measures of goodness (or badness) of a solution, and some may be more effective than others. This has to be offset against the cost of computing the worth of each solution. If it takes 5 minutes to generate one scene it’s no use to me. Most people would rather place bubbles manually than wait this long. So speed is similarly important.

Right now I’m just waiting. I want to use the Mersenne Twister PRNG from Hackage, and it’s just my luck that the Hackage server appears to be down at the moment:

$ cabal install mersenne-random
Resolving dependencies...
Downloading mersenne-random-1.0...
cabal: Error: some packages failed to install:
mersenne-random-1.0 failed while downloading the package.

I’m optimistic about the results from this approach. I’ve been doing some reading on graph visualisation and also map labelling, which both have a lot in common with my problem. Simulated annealing appears to be effective in both instances, so I look forward to seeing what it produces in the end. When I get my shipment of random numbers…

No responses yet

Oct 15 2009

Why “warm fuzzy things” isn’t a valid description

Published by Dougal under Computing

Whenever you have to configure something on a computer — especially in the case of wireless networking — you inevitably come across strange terminology mismatch. Each person that designs a user interface decides that the technical terms which the user needs to enter/know about are not friendly enough, so they get “translated” with some arbitrary scheme. Which is why the same information will be called password or passphrase or secret or authentication key depending on who tells it.

Maybe they have a point in some cases. If you’re steeped in the jargon none of it seems strange or frightening. But I can’t help thinking that maybe there would be less confusion if everybody used the same words to describe the same thing. From a quick check, neither the GNOME or Apple style guides deal with this issue.

The greatest example of this strange folly, and how it makes everything harder for everyone, is when you have to configure a standalone email client. None of the major ISPs seem to give out the standard details any more. Instead they have screenshots and step-by-step guides — a different guide for each popular mail client — telling how you how to enter the same data into differently-named boxes. Obviously this makes the problem of configuring an “unsupported” client much harder. Transferring details from one working mail client to a new one is even harder, as you ultimately have to guess which features the old mail client enables automatically, and vice versa. Nightmare.

No responses yet

Oct 15 2009

First thoughts on new Arborise album

Published by Dougal under Music, Reviews

One of the great discoveries we made when we saw Thea Gilmore live was her support act, solo acoustic guitarist Dan Arborise. I bought his album Around in Circles at the merchandise stall because I was particularly taken with his song To The Sea. It is a really beautiful collection of songs which I can totally recommend.

I bought his second album for Helen’s birthday. It’s called Of Tide & Trail. I haven’t got totally inside it yet but it seems a bit more vicious than his previous disc. Some swearing and pointed lyrics in songs like You’ll All Get What’s Coming To You. You have to listen carefully to find them though: on the surface it’s still beautiful music.

Of course you could say the first album wasn’t short of shocking thoughts: “what do you do when you find that everything that you’ve been taught to love is a lie?”.

3 responses so far

Oct 13 2009

More material to make you feel old

Published by Dougal under Life, Society

This is the second grocery-shopping post I have made in a row, which is slightly alarming. In future will all my blog posts have some theme of standing in the cheese aisle, fretting over which kind of cheddar to buy? I hope not.

Anyway, at some point in the last six months — and I can honestly say I refuse to hunt down the references because it simply does not matter — the supply of 100W incandescent light bulbs has dried up. By law. I refuse to act like it is my human right to buy bulbs that are hot enough to burn fingertips. No doubt there are professional Boring Farts out there doing that very thing, in between complaining about the weather forecast being in degrees Celsius and milk being sold in litres.

Instead I will say something that really matters:

  • The aforesaid Co-op (the one that occasionally stock Our Kind of Tea™) has decided that No 100W Bulbs means No 100W-equivalent Bulbs. I’m only asking for something like a 20W bulb, but the most I can buy is a 65W-equivalent (=14W actual).
  • Sooner or later there will be a generation that doesn’t know what a 100W light bulb means. Woah.

7 responses so far

Oct 11 2009

Can you encourage shops by buying their products?

Published by Dougal under Life

If there’s a product you buy regularly — like tea or washing powder or chocolate biscuits — which isn’t always available locally, you might be inclined to buy it when it’s there, to encourage the shop to stock the item you want.

I saw the brand of tea we like in the big Co-op at the Foot of the Walk the other day, so I bought two boxes instead of just one. Then I realised, if we have twice as much tea in the house, I’ll be buying it half as often. The money we spend on tea in a given year won’t change. So maybe there’s nothing I can actually do to implicitly encourage the Co-op to stock our preferred tea. Unless we drink twice as much tea as before.

2 responses so far

Oct 05 2009

Some intensive code churn to ComicBake; small changes at front

Published by Dougal under Programming

It’s been a week or so since I posted any updates about ComicBake, mostly because I’ve been taking several steps backwards in order to move forward again. Typical manoeuvre.

I replace Brent’s Diagrams library with bindings to GD, which gives more pixel-accurate control over sizes and locations of objects. Diagrams is really good for geometric stuff (and, eh, diagrams) but not so hot for accurate overlays.

The result is a lot of code churn with not an awful lot to show on the outside. The speech bubbles have become smaller and thinner, the edges have squared off and the dialogue tails have disappeared. All because I haven’t got round to adding them back again.

The major point of changing the graphics back end was to get reliable sizing and placement of objects, which will help me work through various methods of placing speech bubbles in a natural(ish) way. Some points which have come to mind.

  • Having speech bubbles overlapping slightly from left to right is not a problem. But when going from right back to left it can be confusing. Thus the conversation above. “This is just…” flows naturally into “Is that because…” but the “Um” then seems lost. It should be definitely lower than the second bubble, to make the order of speech clearer.

  • In a left/right conversation like the one illustrated, the most natural arrangement of bubbles is an interleaving pattern in the large gap down the middle of the screen. Each person’s dialogue should be shifted slightly in their direction. I have an idea that aligning the near edge of the bubbles vertically would look “right” also, but until I’ve actually seen what that looks right I can’t say for certain.

  • I would dearly love to know if there is any theory behind the placement of speech bubbles. Even if it’s just heuristic/aesthetic stuff like rules of thirds and so on. And, if this doesn’t seem a strange idea, advice on dealing with pathological arrangements — such as where the character at the bottom of the page speaks first, then one at the top.

Thoughts on this always appreciated.

In other news, the GD library has given me exact size data for my speech bubbles before I write them to the image. I can actually arrange the speech bubbles in the space knowing what size they’ll be in the end. This is good for the morale! I got to discard my rather annoying guessing function, which was slightly wrong in 100% of cases. Good riddance to that one.

Also making better use of the type system and typeclasses to harden some assumptions and reuse some of the same principles in different places. No doubt there are plenty more places this can be done. For now I’m going to stop fiddling with the graphics stuff and pay more attention to layout. Choosing where speech bubbles go is a fundamental part of the function of this program, and it’ll probably require some serious thinking and experimentation. I’m looking forward to it.

As usual, code can be browsed in the repository or checkout out locally with Darcs:

darcs get --partial http://www.dougalstanton.net/code/comicbake/

No responses yet