Feb 12 2008
Measuring the challenge, and a discussion of terse variables
There was a thread on Reddit recently that briefly landed in a discussion about the terse nature of Haskell variable names.
It doesn’t matter what language you’re in, you should be naming the pieces something meaningful. You don’t have to program Haskell that way. You can still use meaningful variable names, and meaningful names for a lot of other things too, pretty much without penalty.
I am as guilty as the next person of this kind of shenanigan. I wrote this piece of code today before reading this thread, so let’s see how badly it fares on the code-as-documentation scale:
module Main where import Text.HTML.TagSoup import Text.HTML.Download import Text.Printf import Text.PrettyPrint.HughesPJ import Control.Monad import Control.Arrow hiding ((<+>)) import Data.List import System.Time recipesUrl = "http://www.helenhare.net/food/index.php/challenge/" nigella = do tags <- parseTags `liftM` openURL recipesUrl return $ map (chapTitle &&& stats) $ splitIntoChapters $ extractEntry tags where extractEntry = takeWhile (~/= "</div>") . head . sections (~== "<div class=\"entry\">") splitIntoChapters = partitions (~== "<h3>") countAll = length . filter (isTagOpenName "li") countFinished = length . filter (any (isTagOpenName "a")) . partitions (isTagOpenName "li") chapTitle = fromTagText . flip (!!) 1 stats = countFinished &&& countAll ppchapters cs = render $ vcat $ map field cs where field (t,(d,p)) = hcat $ [text t, text (replicate (20 - length t) ' ') ,rnum d, char '/', rnum p <+> equals <+> rnum (pc d p), char '%'] rnum n | n >= 100 = int n | n >= 10 = space <> int n | otherwise= space <> space <> int n pc n d = round $ (*100) $ fromIntegral n / fromIntegral d main = do details <- nigella let totals = (sum *** sum) $ unzip $ map snd details day <- ctYDay `liftM` (getClockTime >>= toCalendarTime) putStrLn "'Nigella Express' Challenge" putStrLn $ ppchapters $ details ++ [("Totals",totals),("Time passed",(day+1,366))]
(If you’re still wondering what it does, it loads the Challenge page from Helen’s blog and counts the number of recipes per chapter, how many have been completed, and the percentage completion. The ouput if I run it today is as follows:
'Nigella Express' Challenge Everyday Easy 1/ 13 = 8% Workday Winners 1/ 14 = 7% Retro Rapido 0/ 12 = 0% Get Up and Go 0/ 11 = 0% Quick Quick Slow 2/ 15 = 13% Against The Clock 1/ 11 = 9% Instant Calmer 2/ 15 = 13% Razzle Dazzle 1/ 16 = 6% Speedy Gonzales 3/ 11 = 27% On The Run 1/ 14 = 7% Hey Presto 5/ 15 = 33% Holiday Snaps 3/ 23 = 13% Storecupboard SOS 3/ 19 = 16% Totals 23/189 = 12% Time passed 43/366 = 12%
So there you go, we are neck-and-neck at the moment — we’ve done 12% of the challenge in 12% of the year, allowing for slight variance in rounding.)
I was surprised to note that, as Haskell code goes, this ain’t the worst. There are some fairly expressive function names in there, especially in the first top-level function. The part that reads—
ppchapters cs = render $ vcat $ map field cs where field (t,(d,p)) = hcat $ [text t, text (replicate (20 - length t) ' ') ,rnum d, char '/', rnum p <+> equals <+> rnum (pc d p), char '%'] rnum n | n >= 100 = int n | n >= 10 = space <> int n | otherwise= space <> space <> int n
—is pretty horrendous though. :-) So, why did I specifically write that section as if all my variables had to be single character and none of my functions could go above 4 letters?
I have to say that I like the terseness of the notation here. This is a throw-away program — look Ma, no comments! — but even so, I think it’s fairly readable. To my mind, the more information stored in the type system the less you need to put on the page. I could have made everything so much more verbose but I don’t think it would really help.
prettyPrintChapters chapters = render $ vcat $ map field chapters where field (title,(done,total)) = hcat $ [text title, text (replicate (20 - length title) ' ') ,rnum done, char '/', rnum total <+> equals <+> rnum (percent done total), char '%'] rnum number | number >= 100 = int number | number >= 10 = space <> int number | otherwise = space <> space <> int number
Looking at the two side by side, the second one is darker, like a page of heavy type. If I may use a tenuous analogy for a moment, it’s like open spaces in design or in music. The smaller, neater notation is easier on the eye and focuses on the things that matter (the functions).
Given the chance to do it again I’d probably make it even neater. At present the physical layout of the functions don’t really reflect how things end up. Let’s see what can be done. First, there are really two things going on above — formatting of numbers and layout of characters. Really it should be obvious when I’m doing what, so let’s create two function to left-align text (ie, create a variable size of space to the right) or right-align text (create a variable size of space to the left). With these two it’s much easier to see what is going on:
ppchapters :: [(String,(Int,Int))] -> String ppchapters = render . vcat . map f where f (t,(d,p)) = lalign t <> ralign d <> char '/' <> ralign p <+> equals <+> ralign (d `pc` p) <> char '%' lalign s = text s <> text (replicate (20 - length s) ' ') ralign n = text (replicate (3 - length (show n)) ' ') <> int n
I’ve simplified the name of the formatting function — field didn’t really mean anything — down to an elegant f. Anything else would throw the reader off the scent. The freedom to not name functions is just as important as choosing a good name, I think. Otherwise you end up with doThing or performOperation which are clumsy and uninformative.
But the focus of the rearrangement was to create a symmetry between the code and the output. And this is it. Compare this:
lalign t <> ralign d <> char '/' <> ralign p <+> equals <+> ralign (d `pc` p) <> char '%'
with this:
Speedy Gonzales 3/ 11 = 27%
I think that’s far more helpful to read than just using longer variables. But maybe I’m funny like that.
6 Responses to “Measuring the challenge, and a discussion of terse variables”
I must say, I (by far) prefer the more descriptive variable names.
Now, if you add to that the more reasonable alignment, it makes it even more readable.
My tradition is to break it further up into where clauses, so that instead of the above you have something like
title doneOfTotal equals' percentDone where title' = (<>) (lAlign title) doneOfTotal = (<+>) (rAlign done <> char '/' <> rAlign total) equals' = (<+>) equals percentDone = rAlign (percent done total) <> char '%'But that’s just me, I’m still a bit of a noob. Note that I haven’t checked that for correctness…
Formatting on that is all fubar :P sorry.
Thanks for the comment Justin! I corrected the comment formatting for you.
I admit I did create a bit of a false dichotomy between descriptive variables and descriptive functions. But I’ve noticed that since variables and functions aren’t syntactically any different it’s quite hard to tell them apart. (Especially when a function is passed in as an argument to another function…) To me, normal names mean functions and single-character names mean data. ;-)
I’ll agree on that one, but the two are completely (intentionally, I suppose) indistinguishable in haskell. You can think of all functions as being values, or all values as being functions that return that value.
Really, I was discussing it in a thread on reddit, but I’d like to see a language that supported (of course…) some richer format than plain text provide both shorthand and long descriptive names for variables.
e.g. you could have x,y,z and a,b,c but also tag them with some kind of automatic documentation that you could say ‘I only want to see long names’ or ‘I don’t care about long names’ or ‘expand the long names in this function’.
But then, I’m kind of obsessed with views for code. Code folding may be the sexiest thing I’ve seen an IDE do, and I badly want to write some elisp to make emacs do it.
I also prefer the longer variable names. There is somewhat of a tension between introducing names (increasing verbosity but also readability) and removing them (by, e.g. making functions point free, reducing verbosity). Ultimately the challenge seems to be to create a coherent “language” of terms that makes sense for the domain under consideration that is still sufficiently compact to fit in your head: not an easy task! Your ralign function above is a classic example of something that makes more sense factored out though, since it’s making use of the language of lists in some code which is otherwise concerned with the language of pretty printing.
Where possible I try and make use of reusable abstractions such as those of abstract algebra to write nice code. For example, I recently made use of the abstraction that visible colors form a ring and also a left module with the real numbers: this lets you express e.g. color addition and scaling in a compact and understandable mathematical notation.
There are no easy answers! However, I guess we are lucky in that Haskell gives us so much freedom in how we express our ideas in code that we can even have this debate: using combinator style coding in Java is almost too horrible to contemplate!
I agree Max, the abstractions from abstract algebra seem to offer endless and fascinating ways to think about common problems. I don’t really understand anything more complicated than monoids or groups (and even then only on an intuitive level) so I don’t completely follow your colour/number example. I shall have to think about it!