May 30 2008

Following links to nowhere

Published by Dougal at 7:49 pm under Bugs

Let’s start with an analogy. I’ve got the manuscript for my bestseller here and I want you to proofread it for me. My options are to email it to you or put it online somewhere for you to download. Email is an effective way of pushing it to you but it might be too large a file to attach. Alternatively I could send you a link which you can use to download from somewhere else. The advantage of this method is that it scales really well — if I want 50 friends to read my new book it’s just as quick to email them all with a link.

There is one disadvantage — if the file isn’t there then the would-be proofreader sees Error 404: File Not Found. And they give up.

Using this analogy we can look at the problem of programs moving data about in memory. There are two ways to do — copy all the data from one place to another, or just copy the location where the data is stored. This is the same idea as emailing the whole document or just emailing the link.

The disadvantages of copying all the data around should be easy to see — it’s terribly slow. All of these delays cost a lot of time because memory access is so much slower than the processor. The alternative, passing references round instead of all the data, is much faster. The only data copied is a tiny little link showing the location of the needed data.

But — and this is really important — if the link points to the wrong data then the consequences can be catastrophic. If the in-memory link (which is called a “pointer”) doesn’t point anywhere then the application (or even the whole computer) can crash. Why such a big reaction to what is essentially a 404 error?

The details to this are quite interesting. Inside your computer there is an intricate hierarchy of privileges — this prevents programs from doing things they’re not allowed to do, like reading protected files without permission. One of the ways in which these privileges work is that each program is assigned an area of memory to use. It can’t just access any old location in memory. If a program tries to access memory outside its own assigned segment it will often be killed by the operating system. (If it is not killed then the program could do something dangerous like interfering with another program or with the operating system itself.) The operating system may state that the program was killed because of a segmentation fault or similar stern warning.

One common fault in programs is trying to follow pointers that don’t point anywhere. These are often called null pointers, since they actually point to address zero in memory, which doesn’t contain anything. This memory location is specifically not owned by anyone, so there should never be any need to look there. Any program which does look there is assumed to be in error and killed by the OS.

You know how common 404 errors are when browsing the web. Null pointers are just as common in programming. They are used all the time and can be quite powerful in many situations. But they are also very dangerous and many modern programming languages do their best to provide the advantages of pointers without the dangers. Sometimes the result is like a safety razor, with most of the function and fewer dangers — and sometimes the result is just the same dangerous tool with a warning sticker.

Trackback URI | Comments RSS

Leave a Reply