Archive for the 'Computing' Category

Jan 05 2010

Filesystems and data recovery (an explanation of sorts)

Published by Dougal under Computing

I posted a short guide yesterday to recovering photographs from corrupted memory cards. There were a few interesting technical points in that post that I glossed over for the sake of the explanation, but discussion on Facebook has prompted me to try filling in those gaps. Let’s have a look.

File systems

How is stuff written to and read from a disk, and why would it suddenly stop working?

Reading and writing disks

Picture a disk as a big sheet of grid paper. Each little square in the grid can hold a single letter, and that little square has a location, like the grid reference. We can read and write a single letter at a time by knowing its grid reference. Most things which are written on the paper will take up many little squares, because most useful information is more than one letter long — words, sentences, paragraphs and so on are stored as long sequences of individual letters. This doesn’t really pose a problem because our sheet of paper is easily big enough to hold all the sentences we could ever write. It is also possible to store many different sequences of text on one sheet of paper, so that books sit alongside essays alongside love letters alongside formal complaints. The only problem this does leave us with is organising and retrieving the information, since a gridded sheet where each box contains a single letter will start to look more like a wordsearch after a while.

If we have been sensible we will write our texts onto the great sheet of paper in a consistent manner, such as left-to-right, and starting a new row one line down whenever we fill up the previous row. The system of organisation we use is arbitrary — we could just as easily use right-to-left, write vertically or move up a row instead of down at the end of each line. It doesn’t really matter as long as we are consistent. (This should be obvious from the history of written language!) We can then identify the beginning of a piece of text by knowing the grid reference of the first letter; and we can retrieve the whole text if we know how long the text is, since it was written in a consistent direction. If we do this we should easily be able to organise all the information we have by keeping track of the starting location and length of all the sequences of letters we write.

The disk, like the gridded paper, is full of little slots where data can be stored; and each slot has a location called its “address”. Data can be stored on disk in long sequences of adjacent slots. To get the sense of the original data we only need to know the starting address and to read the correct number of entries from that point onwards. All we need to extract a file from disk is two numbers — the address and the length. From this we can reconstruct the original data.

Now we have many long essays and books and letters stored on the hypothetical sheet of paper, and these long sequences of text can be reached via the much shorter pairs of numbers we mentioned, representing address and length. But where are these numbers kept? The one location that doesn’t require any intelligence to find on a grid of paper is the start, which we shall call address 0 (zero). How can we use this to our advantage? We could write a list, starting at address 0, which contains the pairs of numbers needed to find all the other sequences of letters! It will never be possible to “lose” the sequence at the top corner of the sheet of paper so we will always be able to access this contents listing for the rest of the page.

But think again: if the sheet of paper is covered in many sequences of letters, it’s very conceivable that one of those sequences will contain map references or temperatures or other lists of numbers that could be mistaken for the pairs of numbers which we use to find the texts themselves. If a sequence of pairs of numbers starts at address 0 how do we know when it has ended and a different sequence of numbers has started? We know the numbered pairs start at 0 but we don’t know how long they continue after that. The solution is to enter the length of the first sequence at address zero itself. It then becomes possible to start at the top of the page and read off a list of starting locations for every piece of information on that page, without fear of accidentally reading data which isn’t relevant.

The organisation of real disks and memory cards is done along these lines, and the system of addresses and references which comprise the implementations are called a file system.

(Bonus question: Imagine that one of the sequences of text buried somewhere in our piece of paper was a list of location/length pairs just like the one at address zero. The location/length of this secondary list would be mentioned by the primary one, but the pairs of numbers it contain would not be mentioned. What use would this be? Can you think of a common instance of this sublist in computer file systems?)

Disk corruption and file recovery

In order to reuse a disk people often “format” it, making a clean slate. The formatting process generally doesn’t affect the data on disk, only the contents listings which help to make sense of it. We can format our grid of paper by altering the contents listing which we store as the first sequence. It’s not difficult — all we have to do is change the number at address zero, the one that represents the length of the contents listing, to a zero. From now on, reading that sheet of paper will immediately show that this sheet contains no data!

Quick and reliable access to the text written on the vast hypothetical sheet of paper hinges on the reliability of the contents listing at the top of the page. It allows us to make sense of the lines and lines of individual letters that follow. If this gets corrupted or erased in any way, our sheet of paper is turned from a useful store into a tedious wordsearch. And as you can see with the formatting example, it takes only a tiny change to render the file listing “wrong”.

So, if through chance or misadventure your disk becomes unreadable in this way, you will not be able to read the data straight off. But it’s all still there, somewhere. The process of file recovery is one of trawling through the disk looking for interesting sequences of text, like a word search. Thankfully many files such as images or videos are organised like a file system in miniature — they contain useful header and file length info which aids identification. For example, JPEG images, which is how digital cameras commonly store their images, start all their files with a particular number. File recovery programs can trawl through the disk looking for known sequences such as this and attempting to pull useful data from the mess. In many cases it works very well.

6 responses so far

Jan 03 2010

Photo file recovery (with Linux tools)

Published by Dougal under Computing

Well, I’ve just done my good deed for the day by pulling some photographs off a corrupted memory card. Helen’s camera stopped talking to its SD card on Boxing Day and the photos from Christmas Day were assumed lost. The solution was quite painless:

  1. Copy the contents of the memory card to a file for safekeeping. The file ends up taking up as much space as the original disk does but it’s generally much nicer to take one copy and use it rather than risk further degradation if you’re worried that the memory card itself is dying.

    $ dd if=/dev/sdd of=memcard.img

    The card reader was named /dev/sdd from the viewpoint of my machine but this will no doubt depend on what kind of reader and how many devices you use. Adjust for your own circumstances. Typing dmesg into a terminal just after you’ve inserted the card in the reader will probably give you usable info — this is what I saw, in case it helps:

    [177907.297698] sd 5:0:0:1: Attached scsi generic sg4 type 0
    [177907.321601] sd 5:0:0:1: [sdd] Attached SCSI removable disk
    [177922.175874] sd 5:0:0:1: [sdd] 3994624 512-byte logical blocks: (2.04 GB/1.90 GiB)
    [177922.177838] sd 5:0:0:1: [sdd] Assuming drive cache: write through
    [177922.183593] sd 5:0:0:1: [sdd] Assuming drive cache: write through
    [177922.183608]  sdd: unknown partition table

    You can see it found the disk, named it sdd but couldn’t read any partition data from it. This is just as we expected because it got corrupted somehow.

  2. Install file recovery tools. This installs a disk recovery program and something that will trawl through disks looking for lost files. It looks like a direct port of a DOS/Windows app because the help commmand lists all CLI flags as being prefixed with a / instead of --. But it still works well.

    $ sudo aptitude install testdisk

  3. Run the newly installed photorec from the above suite. Any found files will be put into a newly-created directory called recovered. If this dir already exists it’ll create a new one with a numbered suffix rather than reusing it. I don’t know why.

    $ sudo photorec -d recovered memcard.img

    We were only expecting a few dozen photographs because the card had been emptied just before Christmas. In the end it pulled over 300 JPEGs and a video from the memory card — a nifty tool if you accidentally reformat a disk!

The Christmas photos were saved and Helen will no doubt be posting them soon. Stay tuned.

3 responses so far

Dec 21 2009

Spare a thought for those poor people without sunshine

Published by Dougal under Computing

The clock applet in GNOME is surprisingly useful to keep track of friends and family in foreign places, and considerably less naff than having real clocks on the wall labelled “New York”, “Paris”.

This is a screenshot from my desktop at work, showing locations of company offices, my brother and the launch site for the NASA test rocket last month. (Don’t judge me!)

screenshot

I really like being able to see whether someone is likely to be up just by looking at the map. It also makes it very clear in winter which countries get more sunlight than others (hint: it’s the ones having their summer time!). And those poor countries across the top which hardly see a ray of sunshine all winter. :-( I shall have to check in six months time how the coverage of light/dark across the globe has changed.

Anyone else got a similar gizmo which is apparently useless but that gives a great deal of excitement?

One response so far

Nov 02 2009

Child safety locks are designed to keep adults out

Published by Dougal under Computing, Family

Mac users may (not) want to try this:

  • Press and hold the ctrl-alt-cmd keys (the three left-most keys on the bottom row)
  • Then repeatedly tap the full stop key

You may stop when your display looks a bit stupid. To reverse the process hold down the three modifiers and tap the comma key instead.

Now, tell me how you would manage all that by accident. Twice. And not notice that you were doing it either time until it was already done.

Once you’ve managed that, try dragging Mail.app from /Applications onto your desktop and doing a system update in a multi-user system. It is possible (though how, I am still unsure) to update the copy on your desktop while leaving an old copy elsewhere which other users will still try to use. The old version will cease to work if the OS update leaves it incompatible with the rest of the system.

All this is possible! All this can be yours!

6 responses so far

Oct 18 2009

Bouncing over hills and into valleys, searching for a solution

It’s been a busy few days on the code front. After coming to no real conclusions about the best way to lay speech bubbles out on a page deterministically, I’ve decided to explore the issue of search. I’ve knocked up a quick implementation of simulated annealing, which I’m now weaving in to rest of the code.

Like all random search methods, simulated annealing lives or dies on the quality of the heuristics you can provide. Since you don’t generally know the answer when you’re solving the problem you can’t test any intermediate solutions in the obvious manner. You instead have to test properties which you expect the solution to have. In this case, I have to test that, for example, the speech bubbles are laid out sequentially on the page. Any pair of speech bubbles which are out of order will ruin the sense of the comic, so this is fairly important.

There are other measures of goodness (or badness) of a solution, and some may be more effective than others. This has to be offset against the cost of computing the worth of each solution. If it takes 5 minutes to generate one scene it’s no use to me. Most people would rather place bubbles manually than wait this long. So speed is similarly important.

Right now I’m just waiting. I want to use the Mersenne Twister PRNG from Hackage, and it’s just my luck that the Hackage server appears to be down at the moment:

$ cabal install mersenne-random
Resolving dependencies...
Downloading mersenne-random-1.0...
cabal: Error: some packages failed to install:
mersenne-random-1.0 failed while downloading the package.

I’m optimistic about the results from this approach. I’ve been doing some reading on graph visualisation and also map labelling, which both have a lot in common with my problem. Simulated annealing appears to be effective in both instances, so I look forward to seeing what it produces in the end. When I get my shipment of random numbers…

No responses yet

Oct 15 2009

Why “warm fuzzy things” isn’t a valid description

Published by Dougal under Computing

Whenever you have to configure something on a computer — especially in the case of wireless networking — you inevitably come across strange terminology mismatch. Each person that designs a user interface decides that the technical terms which the user needs to enter/know about are not friendly enough, so they get “translated” with some arbitrary scheme. Which is why the same information will be called password or passphrase or secret or authentication key depending on who tells it.

Maybe they have a point in some cases. If you’re steeped in the jargon none of it seems strange or frightening. But I can’t help thinking that maybe there would be less confusion if everybody used the same words to describe the same thing. From a quick check, neither the GNOME or Apple style guides deal with this issue.

The greatest example of this strange folly, and how it makes everything harder for everyone, is when you have to configure a standalone email client. None of the major ISPs seem to give out the standard details any more. Instead they have screenshots and step-by-step guides — a different guide for each popular mail client — telling how you how to enter the same data into differently-named boxes. Obviously this makes the problem of configuring an “unsupported” client much harder. Transferring details from one working mail client to a new one is even harder, as you ultimately have to guess which features the old mail client enables automatically, and vice versa. Nightmare.

No responses yet

May 14 2009

Amazon wish list quite crappy really

Published by Dougal under Computing

The Amazon wish list appears to be the least intelligent thing on the internet, and I’m including the comments on the BBC’s Have Your Say forums in that generalisation.

  • If you buy something from your wishlist, it doesn’t disappear from your wishlist. In fact, the opposite happens — Amazon then advertises a whole bunch of related things which are totally redundant because of what you bought. There are only so many widescreen TVs a person can buy.
  • If you pre-order something it doesn’t disappear from your wishlist either. In fact, it remains on your wishlist with a button to “pre-order this item”.

I find the wishlist really useful to remember stuff that I want to buy in future but it’s completely useless at updating when you’ve bought something without visiting the wish list page first and choosing to buy from there.

One response so far

May 12 2009

Failure time less than one month. Ouch!

Published by Dougal under Computing

On Saturday night I was googling something and the Firefox search bar refused to co-operate when I hit return. Nothing happened. At about the same time the automated updates running in the background seemed to hang or become similarly inactive. I didn’t see a connection betwen them at the time.

For some reason I next remember shutting the computer down (I don’t remember why any longer). This produced what was probably a kernel panic (I couldn’t really see what was happening because the screen was also strobing in a most unpleasant fashion. I had the laptop lid 3/4 closed and Helen said it looked like there was a lightning storm inside the computer.)

After a restart the computer didn’t get back on its feet. GRUB produced a number of different errors after each reboot but didn’t get very far. Booting from an Ubuntu install disc got me to the stage where I could query the system. The hard disc was completely inaccessible and produced huge quantities of I/O errors when it was being interrogated at startup.

Fearing the worst, I had a look at the BIOS. It said:

Hard Disk: None

Ah. I guess that explains things. My hard disk or the controller had given up the ghost. It was probably the disk because the internal CD/DVD drive was still working, and they probably share controllers.

I took the computer to John Lewis on Sunday afternoon and demonstrated the problem to their After Sales staff. They offered to order a direct replacement because they had none in stock, or I could take a wander to see if they had something else that would replace it just as well. I ended up with an HP Pavilion G60 laptop. It’s got a slightly more compact feel to it than the Toshiba. I hope the drive lasts longer though!

No responses yet

Apr 11 2009

I have no fond memories of that compiler

There’s another post about static typing (amongst other things) just been posted to Reddit.

(Tangent: I’ve just discovered that left- and right-mouse buttons pressed together emulates the awesomely useful middle-mouse button in Gnome, presumably Linux, presumably other Unix variants. Excellent.)

Static typing in programming is a bit like grammar checking in writing. Except better, because grammar checking is a much harder job. Anyway, it stops you making dumb mistakes, but plenty of people object to it because either (a) they don’t make mistakes or (b) they know better than the checker.

I don’t particularly understand the problem many people have with static typing, though I think the problem may be to do with how badly most languages do it. (You can imagine quite going off grammar checkers if you only knew Microsoft Word. But if you had a real live editor to point out your flaws you’d probably appreciate things more. Obviously the comparison is not legitimate but I think you see the gist — if something is done badly it can become more of a hindrance and is better removed altogether.) I have never done anything where static type checking has held me back, but I’m sure such things exist.

One interesting comment to come out of the above discussion regards teaching. Is it better to have a strict language for teaching purposes, or should the novice programmer discover why their malformed program is bad? Despite preferring static typing, I am actually erring towards no static typing for teaching purposes. I think falling off your bike a few times (metaphorically speaking, of course) will go some way to encourage more rigorous thinking, in the way that being told “no! you’re wrong!” before you try won’t. (Actually, I wonder if there is a language that exists which does static typing but will compile all the same. Then you can read any compilation errors and watch your program fail.)

I certainly remember hating the compiler when I was learning Java at university. Hating it. It never seemed obvious or reasonable that the problem was with my code; it was just a capricious bully, that compiler. Maybe if I’d been able to see the disastrous effects of running my ridiculous code I would have learned that the compiler did know what it was doing after all. :-)

One response so far

Apr 10 2009

How the other 90% live

Published by Dougal under Computing

I bought a new laptop yesterday, and I haven’t changed the OS yet. I’ve been doing some things inside Vista since last night. (I downloaded the Ubuntu 9.04 ISO and tried to burn it to CD. Found out Vista has no capacity to burn ISOs. Only learned this after it had tried to burn the ISO as a file to my blank CD. So that’s another coaster I’ve created due to useless user interface and completely inappropriate default behaviour.)

If I had bought this machine to use as-is I would be quite disappointed. It took about 45 minutes to set up with nearly no interaction from me — just repeated restarts and long periods of blank screen. The machine is so full of useless rubbish that it consumes 2GB of RAM and has about a dozen systray icons (all waving their little bubble notifications at me) at startup. McAfee just informed me that I had to restart the machine just after I turned it on this morning. Excuse me, whose machine is this?

No wonder people hate computers.

2 responses so far

Next »