Feb 12 2008

Measuring the challenge, and a discussion of terse variables

Published by Dougal at 11:06 pm under Programming

There was a thread on Reddit recently that briefly landed in a discussion about the terse nature of Haskell variable names.

It doesn’t matter what language you’re in, you should be naming the pieces something meaningful. You don’t have to program Haskell that way. You can still use meaningful variable names, and meaningful names for a lot of other things too, pretty much without penalty.

I am as guilty as the next person of this kind of shenanigan. I wrote this piece of code today before reading this thread, so let’s see how badly it fares on the code-as-documentation scale:

module Main where
 
import Text.HTML.TagSoup
import Text.HTML.Download
 
import Text.Printf
import Text.PrettyPrint.HughesPJ
 
import Control.Monad
import Control.Arrow hiding ((<+>))
import Data.List
import System.Time
 
recipesUrl = "http://www.helenhare.net/food/index.php/challenge/"
 
nigella = do
    tags <- parseTags `liftM` openURL recipesUrl
    return $ map (chapTitle &&& stats) $ splitIntoChapters $ extractEntry tags 
    where
        extractEntry = takeWhile (~/= "</div>") . head . sections (~== "<div class=\"entry\">")
        splitIntoChapters = partitions (~== "<h3>")
        countAll = length . filter (isTagOpenName "li")
        countFinished = length . filter (any (isTagOpenName "a")) . partitions (isTagOpenName "li")
        chapTitle = fromTagText . flip (!!) 1
        stats = countFinished &&& countAll
 
ppchapters cs = render $ vcat $ map field cs
    where field (t,(d,p)) = hcat $
            [text t, text (replicate (20 - length t) ' ')
            ,rnum d, char '/', rnum p <+> equals <+> rnum (pc d p), char '%']
          rnum n | n >= 100 = int n
                 | n >= 10  = space <> int n
                 | otherwise= space <> space <> int n
 
pc n d = round $ (*100) $ fromIntegral n / fromIntegral d
 
main = do
    details <- nigella
    let totals = (sum *** sum) $ unzip $ map snd details 
    day <- ctYDay `liftM` (getClockTime >>= toCalendarTime)
 
    putStrLn "'Nigella Express' Challenge"
    putStrLn $ ppchapters $ details ++ [("Totals",totals),("Time passed",(day+1,366))]

(If you’re still wondering what it does, it loads the Challenge page from Helen’s blog and counts the number of recipes per chapter, how many have been completed, and the percentage completion. The ouput if I run it today is as follows:

'Nigella Express' Challenge
Everyday Easy         1/ 13 =   8%
Workday Winners       1/ 14 =   7%
Retro Rapido          0/ 12 =   0%
Get Up and Go         0/ 11 =   0%
Quick Quick Slow      2/ 15 =  13%
Against The Clock     1/ 11 =   9%
Instant Calmer        2/ 15 =  13%
Razzle Dazzle         1/ 16 =   6%
Speedy Gonzales       3/ 11 =  27%
On The Run            1/ 14 =   7%
Hey Presto            5/ 15 =  33%
Holiday Snaps         3/ 23 =  13%
Storecupboard SOS     3/ 19 =  16%
Totals               23/189 =  12%
Time passed          43/366 =  12%

So there you go, we are neck-and-neck at the moment — we’ve done 12% of the challenge in 12% of the year, allowing for slight variance in rounding.)

I was surprised to note that, as Haskell code goes, this ain’t the worst. There are some fairly expressive function names in there, especially in the first top-level function. The part that reads—

ppchapters cs = render $ vcat $ map field cs
    where field (t,(d,p)) = hcat $
            [text t, text (replicate (20 - length t) ' ')
            ,rnum d, char '/', rnum p <+> equals <+> rnum (pc d p), char '%']
          rnum n | n >= 100 = int n
                 | n >= 10  = space <> int n
                 | otherwise= space <> space <> int n

—is pretty horrendous though. :-) So, why did I specifically write that section as if all my variables had to be single character and none of my functions could go above 4 letters?

I have to say that I like the terseness of the notation here. This is a throw-away program — look Ma, no comments! — but even so, I think it’s fairly readable. To my mind, the more information stored in the type system the less you need to put on the page. I could have made everything so much more verbose but I don’t think it would really help.

prettyPrintChapters chapters = render $ vcat $ map field chapters
    where field (title,(done,total)) = hcat $
            [text title, text (replicate (20 - length title) ' ')
            ,rnum done, char '/', rnum total <+> equals <+> rnum (percent done total), char '%']
          rnum number | number >= 100 = int number
                      | number >= 10  = space <> int number
                      | otherwise     = space <> space <> int number

Looking at the two side by side, the second one is darker, like a page of heavy type. If I may use a tenuous analogy for a moment, it’s like open spaces in design or in music. The smaller, neater notation is easier on the eye and focuses on the things that matter (the functions).

Given the chance to do it again I’d probably make it even neater. At present the physical layout of the functions don’t really reflect how things end up. Let’s see what can be done. First, there are really two things going on above — formatting of numbers and layout of characters. Really it should be obvious when I’m doing what, so let’s create two function to left-align text (ie, create a variable size of space to the right) or right-align text (create a variable size of space to the left). With these two it’s much easier to see what is going on:

ppchapters :: [(String,(Int,Int))] -> String
ppchapters = render . vcat . map f
    where f (t,(d,p)) = lalign t <> ralign d <> char '/' <> ralign p <+> equals <+> ralign (d `pc` p) <> char '%'
          lalign s = text s <> text (replicate (20 - length s) ' ')
          ralign n = text (replicate (3 - length (show n)) ' ') <> int n

I’ve simplified the name of the formatting function — field didn’t really mean anything — down to an elegant f. Anything else would throw the reader off the scent. The freedom to not name functions is just as important as choosing a good name, I think. Otherwise you end up with doThing or performOperation which are clumsy and uninformative.

But the focus of the rearrangement was to create a symmetry between the code and the output. And this is it. Compare this:

lalign t <> ralign d <> char '/' <> ralign p <+> equals <+> ralign (d `pc` p) <> char '%'

with this:

Speedy Gonzales       3/ 11 =  27%

I think that’s far more helpful to read than just using longer variables. But maybe I’m funny like that.

6 responses so far

6 Responses to “Measuring the challenge, and a discussion of terse variables”

  1. Justin Georgeon 13 Feb 2008 at 5:21 am

    I must say, I (by far) prefer the more descriptive variable names.

    Now, if you add to that the more reasonable alignment, it makes it even more readable.

    My tradition is to break it further up into where clauses, so that instead of the above you have something like

    title doneOfTotal equals' percentDone
        where 
          title' = (<>) (lAlign title)
          doneOfTotal = (<+>) (rAlign done <> char '/' <> rAlign total)
          equals' = (<+>) equals
          percentDone = rAlign (percent done total) <> char '%'

    But that’s just me, I’m still a bit of a noob. Note that I haven’t checked that for correctness…

  2. Justin Georgeon 13 Feb 2008 at 5:21 am

    Formatting on that is all fubar :P sorry.

  3. Dougalon 13 Feb 2008 at 7:59 am

    Thanks for the comment Justin! I corrected the comment formatting for you.

    I admit I did create a bit of a false dichotomy between descriptive variables and descriptive functions. But I’ve noticed that since variables and functions aren’t syntactically any different it’s quite hard to tell them apart. (Especially when a function is passed in as an argument to another function…) To me, normal names mean functions and single-character names mean data. ;-)

  4. Justin Georgeon 13 Feb 2008 at 9:12 am

    I’ll agree on that one, but the two are completely (intentionally, I suppose) indistinguishable in haskell. You can think of all functions as being values, or all values as being functions that return that value.

    Really, I was discussing it in a thread on reddit, but I’d like to see a language that supported (of course…) some richer format than plain text provide both shorthand and long descriptive names for variables.

    e.g. you could have x,y,z and a,b,c but also tag them with some kind of automatic documentation that you could say ‘I only want to see long names’ or ‘I don’t care about long names’ or ‘expand the long names in this function’.

    But then, I’m kind of obsessed with views for code. Code folding may be the sexiest thing I’ve seen an IDE do, and I badly want to write some elisp to make emacs do it.

  5. Maxon 13 Feb 2008 at 2:07 pm

    I also prefer the longer variable names. There is somewhat of a tension between introducing names (increasing verbosity but also readability) and removing them (by, e.g. making functions point free, reducing verbosity). Ultimately the challenge seems to be to create a coherent “language” of terms that makes sense for the domain under consideration that is still sufficiently compact to fit in your head: not an easy task! Your ralign function above is a classic example of something that makes more sense factored out though, since it’s making use of the language of lists in some code which is otherwise concerned with the language of pretty printing.

    Where possible I try and make use of reusable abstractions such as those of abstract algebra to write nice code. For example, I recently made use of the abstraction that visible colors form a ring and also a left module with the real numbers: this lets you express e.g. color addition and scaling in a compact and understandable mathematical notation.

    There are no easy answers! However, I guess we are lucky in that Haskell gives us so much freedom in how we express our ideas in code that we can even have this debate: using combinator style coding in Java is almost too horrible to contemplate!

  6. Dougalon 13 Feb 2008 at 3:13 pm

    I agree Max, the abstractions from abstract algebra seem to offer endless and fascinating ways to think about common problems. I don’t really understand anything more complicated than monoids or groups (and even then only on an intuitive level) so I don’t completely follow your colour/number example. I shall have to think about it!