Sep 04 2009
It’s like a comic for people who can’t draw
I discovered Dinosaur Comics a few years and I think it’s great. It’s a bit strange but you soon get used to the idea. The scene is always the same from comic to comic but the dialogue changes. (Sometimes this repetition is even mentioned in the stories.) Dinosaur Comics is not the only comic in this style, but it’s the first I read and the only one I come back to regularly.
I wondered how these comics were constructed. Most of the comic writers who do this use graphics programmes — Photoshop or Inkscape or something similar. I got to wondering if all that overhead was necessary. The only real changes from episode to episode is the text, which can surely be written just as well in a text file as anywhere else. Is there not a program that can insert today’s script into the scene?
(Now I fully admit that there is a great skill in lettering and bubble placement when making comics. But something I have come to notice recently is that something which is 90% of the way there, or right-on 90% of the time, can be incredibly useful. And let’s not shoot down this project before it’s really airborne, eh?)
I did a bit of search and found some fascinating work, but nothing that seemed to come close to what I was thinking. The coolest thing I did find in the right area was automatic film-to-comic translation research. Using the screenplay and the movie side-by-side, the researchers’ program could extract relevant frames from the movie and place the correct dialogue in speech bubbles next to the actors. One of the authors has written an excellent page showing what their program does and has example scenes from a number of mainstream movies. Naturally, no source code is published. :-(
I decided in the end that any prior art on this subject was too well-hidden for me to find with Google. There were many fits and starts. My first attempts were almost immediately shelved because I was attempting to interface directly with Cairo to draw text and speech bubbles onto images. As nice as Cairo is, it’s not the right level of abstraction for this job.
Later on Brent Yorgey started his Diagrams project, building a functional combinator approach to drawing. This was exactly the approach I needed to get started, and I made some contributions so that it supported text. My sincere thanks to Brent for unwittingly kickstarting this project. I spent some time playing around with Diagrams and getting a feel for the best way to do things. I made a number of examples which could draw speech bubbles and place them on images at some desired location.
After a lot of thinking, and producing altogether too many doodles to be considered healthy, I finally got down to some serious procrastination. My problem was that I didn’t know where to start — I couldn’t see what shape the program was meant to be, and I didn’t work it out for a long time. But since I’m getting close to actually releasing the first version, I can say a good deal more about the structure and construction of the program.
First of all, it does indeed create “comics” out of written scripts. That you can be assured of. To create a three-panel comic strip you will need:
- A script with three scenes.
- At least one image, referenced in the script, which depicts the “scene” in the comic strip.
- A file which gives the co-ordinates of any characters that appear in the scene.
I apologise for the requirement (3) but it’s best way that I know to ensure the speech bubbles can be attached to people in the scene, and to make sure bubbles don’t land in middle of a person’s face. And the idea is that you only have to write this file once. It will always be true for that image. I’ve also made it slightly easier because the parser expects the actual co-ordinate data to be in the form of an HTML image map as exported by any number of graphics programs. You can load your images into The GIMP and save out an image map with each character boxed and tagged. It’s cheap and it works.
On the inside it’s modelled pretty like a compiler. The source file is the script file which is pretty close to the script you’d find for a stage or screenplay. There are a number of obvious differences — specifying filenames, for example — but I’ve tried to minimise the amount of non-story-writing an author would do on a regular basis.
The script is loaded in and does two separate things. First it reads through the script itself, noting down the names of characters and everything that they say in each scene. Obviously we want to know what the characters are saying, but knowing who is in a scene is also important. The program finds all the background images whose filenames are stored in the script. At the moment I can only handle .png files but this may change. It then looks for a .map file with the same name and loads that to see which characters appear in that image. If there’s a speaking character in the script who doesn’t have a location in any of the images we have a problem! So it’s important to know who we expect to find and to know that it’s accurate.
All along the way we’re noting down all this information for each scene — who is speaking? where are they? what are they saying? Once we know all this we can start placing speech bubbles around these characters. The canonical method of placing speech bubbles is left-to-right and top-to-bottom (I presume this changes for other languages which read right-to-left or vertically?) so we try that in a very simple way. We create a big margin around every character, like a thick frame, which is where bubbles can be placed. Each time we place a bubble we make sure that previous bubbles are placed “above” or “left” of the current location to keep the sense of the conversation.
Lastly we convert all this abstract data of locations and text and images into concrete files, which are written out to disk: frame1.png, frame2.png, frame3.png. At time of writing there are a few annoying problems, the most important being the bubbles have really rubbish tails. Creating good speech bubbles programmatically — and pointing the tails towards the speaker — is a problem I have not solved. The bubbles are a bit crap right now.
I hope to publish the code and some examples within the next couple of days. Look out for it!
Wow, this sounds great! I’ve (prehaps obviously) thought a bit about this problem myself over the years, but never made any movement on it because lettering the comic is something that I very much enjoy.
Some things I had considered: Having the frames in one image and the characters (in the places they’d be in the frames) in another, so that you know, down to the pixel, where people are appearing. If you assume everyone talks out of the top of their heads you know where to draw the voice bubble tails too - the only trick is knowing which character represents which person, and with that we’re back to the config file option.
One of the trickiest things is knowing how to fit text in when it’s tight: do you shrink the font? Cut into character’s bodies? Try breaking up long words onto different lines?
Anyway this is just me rambling - I’m looking forward to seeing what you come up with!
Neat! I’m always glad to unwittingly inspire people. =)