Jul 05 2008
Avoiding waiting for the bus with Haskell
Last year I was standing at the bus stop, awaiting my carriage home, and looking at the real-time LCD bus tracker next to me. If these things are really real-time, I thought to myself, they must be communicating with the buses and with each other so they know when buses are delayed or cancelled. Surely this information is available somewhere?
I looked around on the Lothian Buses website but found nothing. So I wrote them an email. I suggested they make the data available as some text feed or RSS — something simple to manipulate. I never got a response.
But earlier this year someone else noticed that Bus Tracker had been launched, which provides a front end to the bus data I was looking for. They had also added links to Google Maps to plot the stops of a bus, or to interactively choose your nearest bus stop. It’s a damned interesting site — not very pretty but quite powerful.
I thought it would be useful/interesting to write a program to query the BusTracker site. At this point I’m just writing a library to do the tasks. Wrapping an interface around this can come at a later date.
The full library can be found online, or get the darcs repository with:
darcs get http://www.dougalstanton.net/code/buses/
Let the games begin
The first step is declaring what we’ll need. Queries are submitted using HTTP GET requests, which makes everything very simple for us. We use the Haskell Curl bindings for that. We also need to parse the results — TagSoup is really handy for this, because the results we get back are pretty sloppy HTML.
module BusTracker (Query(..), Result(..), getBusTimes) where import Network.Curl import qualified Network.Curl.Code as CC import Text.HTML.TagSoup import Data.Char (isDigit, isLetter)
Next up is a Query datatype. The unique ID code for the bus stop is a requirement. There are also two optional parts, the desired arrival time and the service number. So you can look for all the buses arriving at 5pm, or all the number 25s or a combination of these two.
The URL to make these queries looks like “stopnumber?time!busnumber”. The ? and ! are only needed if the next part of the query is there. The ppQuery function converts the Query type into the right kind of string.
data Query = Q { queryBusStop :: String , queryBusTime :: Maybe String , queryBusNumber :: Maybe String } deriving Show ppQuery q = queryBusStop q ++ time ++ service where time = maybe [] ("?" ++) (queryBusTime q) service = maybe [] ("!" ++) (queryBusNumber q) stopcode_url = "http://mybustracker.co.uk/display.php?clientType=b&busStopCodeQuick="
At the end, we have to deal with answers from the server. A typical response will look like this (cleaned up a bit):
<pre>22 GYLE CENTRE <span class="handicap"> </span>*21</pre>
From left to right, we have bus number, its ultimate destination, indication of whether it can handle wheelchairs, and the projected arrival time (minutes from now). The asterisk indicates an estimate, presumably because the bus has not updated its location for some time.
I haven’t done very much to deal with this data. For example, I have left the arrival time as a string rather than converting it to a number or even adding it to the actual current time to get a real timestamp. One thing I did do was remove any asterisk from the from of the time and set the resultEstimated field accordingly.
data Result = R { resultBusNumber :: String , resultDestination :: String , resultDisabled :: Bool , resultArrivalTime :: String , resultEstimated :: Bool } deriving Show emptyResult = R "" "" False "" False parseLine r ts = r { resultBusNumber = head txt , resultArrivalTime = if est then tail time else time , resultDisabled = any (~== "<span class=\"handicap\">") ts , resultDestination = unwords $ tail $ init txt , resultEstimated = est } where txt = words $ innerText ts time = last txt est = '*' == head time
Right, we should be ready to make requests!
We pass the remote server a Query and we might get back a list of Result. This is embodied in the type signature, Query -> IO (Maybe [Result]). Neat eh?
Rather cheekily, the remote server will tend to return a “200 OK” result even if there was a massive problem and it all went pear-shaped. So even though I check the status code, don’t expect much good to come of it.
It’s a little superfluous to wrap a list in a Maybe type but I think it makes the result more explicit.
getBusTimes :: Query -> IO (Maybe [Result]) getBusTimes qry = do (statcode, body) <- curlGetString (stopcode_url ++ ppQuery qry) [] let result = if statcode == CC.CurlOK then extractTimetable $ parseTags body else [] return $ if null result then Nothing else Just result
Parsing the big munge of data we receive from the server isn’t very pretty, but is considerably eased with Neil Mitchell’s TagSoup library, which makes very few assumptions about the deeper structure of whatever HTML you pass it.
This is a simple pipeline, split into three logical sections. First we want to pull out the actual table where all the relevant data is stored — naturally it’s not in an HTML table, because that would be both sensible and useful. But luckily all the <div> tags have useful ID attributes. The table is split up into its individual rows and the rows parsed using the function I outlined above, parseLine. I made sure to pass in a default/empty Result type which is altered with data from the web page, so I can be reasonably sure that each result starts off from a known good.
extractTimetable = map (parseLine emptyResult) . getTableRows . getTable where getTable = takeWhile (~/= "</div>") . dropWhile (~/= "<div id=\"displayDepartures\">") getTableRows = map (takeWhile (~/= "</pre>")) . partitions (~== "<pre>")
In operation, it’s just as you’d expect.
*BusTracker> :set -fno-print-bind-result
*BusTracker> let query = Q "36232973" Nothing Nothing
*BusTracker> rs <- getBusTimes query
*BusTracker> maybe emptyResult head rs
R {resultBusNumber = "16", resultDestination = "HUNTERS TRYST", resultDisabled = True, resultArrivalTime = "11", resultEstimated = False}
*BusTracker>
Using this as a base library, we can write a neat program to tell us when to leave the building so that there won’t be a long wait for the next bus.
I wonder if such technical wizardry will actually prevent long waits for the next bus (how accurate is any of this data for instance), and whether you will write the neat program :-)
Why not turn neat program this in to a mini site to share the Haskell powered goodness with other residents of your city?
In the spirit of Web 2.0-ness I suggest the site be called Busr.
You could even register bu.sr (I don’t think it’s been registered yet).
I could write the neat program — it wouldn’t be many more lines — though it’s probably easier for the good people of Edinburgh to just bookmark a relevant URL for their nearest bus stop! Though it would probably be good practice in writing a web/Haskell application, something which I’ve not tried yet.
I think a more interesting application would trying to cope with buses which don’t go to the end of the line (especially ones which stop at the depot at the end of the day). Though a service which emails you when you need to leave the building might be fun! ;-)
I was going to ask how your Haskell app would know you’re about to leave the building, but given that time travel is a built in feature of the language I suppose you could just click a button when you left.
BTW I assume you’ve seen pizza_party (it made /. back in the day).
Given that Haskell has time travel and Python anti-gravity/flight, what powers does Ruby have? I’m sure there must be some humorous/mythical stories about it but I can’t think of any right now.
I can’t think of any either :-/
Whatever stories there are are probably written in Japanese.
ăăăȘă ^__^