Tag Archives: harddisk failure

Your harddrive *will* fail – it’s just a question of when

failed harddisksThere are a few certainties in life: death, taxes and harddisk failure. I have no less than 6 failed harddisks sitting here on my desk patiently awaiting their appointment with Mr Lump Hammer. 2 Seagates, 3 Maxtors and 1 Western Digital. This equates to roughly one disk failure per year. Perhaps this is not suprising given that I have about 9 working harddisks at the moment spread across various machines. Given the incredible tolerances to which harddisks are manfactured, perhaps it is a miracle harddisks work at all.

As an analogy, a magnetic head slider flying over a disk surface with a flying height of 25 nm with a relative speed of 20 meters/second is equivalent to an aircraft flying at a physical spacing of 0.2 ┬Ám at 900 kilometers/hour. This is what a disk drive experiences during its operation. –Magnetic Storage Systems Beyond 2000, George C. Hadjipanayis from Wikipedia

We all know we need to back-up our data. But it is a chore that often gets forgotten at the most critical periods. Here are my hints for preparing yourself for that inevitable ‘click of death’.

  • Buy an external USB/Firewire harddrive. 500GB drives are ridiculously cheap these days. Personally I don’t like back-up tapes due to experiences of them stretching and corrupting data.
  • Back-up images of the entire OS, not just the data. You can use Acronis TrueImage on Windows and SuperDuper on MacOSX. This can save you days restoring your entire development environment and applications from scratch.
  • Back-up individual files as well as entire OS images. You don’t want to have to restore a whole image to retrieve one critical file. Windows Vista and Mac OS X Leopard both have back-up applications built into the OS.
  • Use a separate machine to your development machine as source code server.
  • Use a RAID-1 (mirrored) disk on your main development machine[1]. It is worth noting that this actually doubles the likelihood of harddisk failure, but makes the likelihood of a catastrophic failure much lower. Keep an identical 3rd drive on hand to swap in when a drive fails.
  • Back-ups aren’t much use if they get incinerated along with your office in a fire, so store copies off-site. For example you can:
  • Make sure any off-site copies are securely encypted, for example using Axcrypt.
  • Automate your back-ups as far as possible. Computers are much better at the dull repetitive stuff.
  • Test restoring data once in a while. There is not much point backing up data only to find you can’t restore it when needed.

There are lots of applications for backing up individual files. So many in fact, that no-one has any hope of evaluating them all (marketing tip: don’t write another back-up application – really). I also worry that data stored in their various proprietary formats might not be accessible in future due to the vendor going out of business. I find the venerable DOS xcopy adequate for my needs. I run it in a scheduled Windows batch file to automatically synch file changes on to my usb harddrive (i:) every night. Here it is in all its glory:

XCOPY c:\data i:\data /d /i /s /v /f /y /g /EXCLUDE:exclude.txt

The exclude.txt file is used to exclude subversion folders and intermediate compiler files:

\.svn\
.obj
.ilk
.ncb
.pdb
.bak>

Which of the above do I do? Pretty much all of them actually. At least I try, I haven’t yet automated the offsite backup. This may seem rather excessive, but it paid dividends last month when gremlins went on the rampage here in the Oryx Digital office. I had 2 harddrive failures in 2 weeks. The power supply+harddisk+network card on my old XP development machine failed then, while I was in the process of moving everything to my new Vista development machine, one of the RAID-1 disks on the new machine failed.

Things didn’t go quite according to plan though. The new RAID-1 box wouldn’t boot from either harddisk. I have no idea why.

raid1Also the last couple of weekly Acronis image back-ups had failed and I hadn’t done anything about it. I had recent back-ups of all the important data, but I faced a day or more reinstalling all the apps I had installed since the last successful image. It took several hours on the phone to Dell technical support and much crawling around on the floor before I could I get the new RAID-1 box to boot off one harddisk. I was then able to rebuild RAID-1 using the spare harddisk I had on standby for such an eventuality. Nothing was lost, apart from my sense of humour.

Dell offered to replace the defective harddisk under warranty, but I declined on the grounds that there is far too much valuable information on this disk (source code, digital certificate keys, customer details etc) for me to entrust it to any third party. Especially given that Dell reserve the right to refurbish the harddisk and send it to someone else. What if they forgot to wipe it? My experiences with courier companies also haven’t given me great confidence that the disk would reach Dell. And I didn’t want to receive a reburbished disk as a replacement. It just isn’t worth relying on a refurb given how cheap new harddisks are. So the harddisk has joined the back of the growing queue to see Mr Lump Hammer.

The availability of cheap harddisks and cheap bandwidth means that it has never been easier to backup your systems. No more fiddling with mag tapes. Of course it is possible that your harddisk will work perfectly until it becomes obselete, but I think it would be very unwise to assume that this will be the case. Don’t say I didn’t warn you…

Further reading:

What’s your backup strategy? (the prolific and always worth reading Jeff Atwood beats me to the punch)

[1] RAID-1 is built in to some Intel motherboards and is available as a relatively inexpensive extra from Dell. You may have to ask for it though – it wasn’t listed as a standard configuration option when I purchased my Dell Dimension 9200.

[2] Since I wrote this article I installed the latest version of JungleDisk on my Vista box. On the 3 occasions I have tried to use it it hung Vista to the point where I had to I had to cut the power in order to reboot. I have now uninstalled it.