Tag Archives: raid1

Harddisk woes

I was busy programming a few days ago when the machine froze for a few seconds, followed by an error message from the Intel Matrix RAID controller than one of the harddisks in my RAID1 (mirrored) pair had failed. Damn. This is the second time this has happened on this machine in the 2.5 years I have had it. I don’t seem to have much luck with harddisks. It might not be coincidence that it happened on one of the hottest days of the year. I removed the defective disk and put in an identical spare I had bought for such an eventually and rebuilt the RAID1 pair from the surviving harddisk. I felt quite pleased with myself.

A  couple of days later the same error message appeared. The new disk had apparently failed. Double damn. I rebooted a couple of times. No joy. It seems unlikely that an unused disk would fail within 48 hours, perhaps it is the RAID controller? I updated to the latest Intel Matrix RAID driver and swapped the two disks around. It still wouldn’t recognize the newly added harddisk, so it seems the new disk really is defective. I swapped the working disk with the harddisk that had failed a couple of days ago. The ‘failed’ harddisk booted OK! Something strange going on here.

I could probably send the failed disk back to Seagate, but I am simply not prepared to risk my sensitive data to save myself £50. I tried to order another identical harddisk but, inevitably, the identical model isn’t available 2.5 years later. The disks are:

SEAGATE BARRACUDA 7200.10 ST3500630AS 500GB 7200RPM 16MB SATA-300 3.5"

Apparently the .10 is the generation number (thanks to Dennis on the ASP forums for that).

I am currently running the machine on the one good harddisk, being very conscientious about my backups. I am undecided what to do next.

  1. Order a 7200.12 disk and see what happens when I plug it in.
  2. Replace the RAID controller. I believe the Intel Matrix RAID controller is firmware on a chip on the motherboard, so replacing it doesn’t sound like much fun. And it isn’t clear that it is the cause of the problem.
  3. Buy a new PC. This one is only 2.5 years old and it means stumping up a load of cash and all the hassle of moving everything over. I would rather wait until Windows 7 is released before I buy consider buying a new machine (I am thinking about getting someone like overclockers.co.uk to build me a lean, mean, 64-bit, compiling machine).

Option 1 sounds like the easiest and cheapest options. Any other ideas? Is it safe to pair a 7200.10 and a 7200.12 of the same size for RAID1?

Your harddrive *will* fail – it’s just a question of when

failed harddisksThere are a few certainties in life: death, taxes and harddisk failure. I have no less than 6 failed harddisks sitting here on my desk patiently awaiting their appointment with Mr Lump Hammer. 2 Seagates, 3 Maxtors and 1 Western Digital. This equates to roughly one disk failure per year. Perhaps this is not suprising given that I have about 9 working harddisks at the moment spread across various machines. Given the incredible tolerances to which harddisks are manfactured, perhaps it is a miracle harddisks work at all.

As an analogy, a magnetic head slider flying over a disk surface with a flying height of 25 nm with a relative speed of 20 meters/second is equivalent to an aircraft flying at a physical spacing of 0.2 µm at 900 kilometers/hour. This is what a disk drive experiences during its operation. –Magnetic Storage Systems Beyond 2000, George C. Hadjipanayis from Wikipedia

We all know we need to back-up our data. But it is a chore that often gets forgotten at the most critical periods. Here are my hints for preparing yourself for that inevitable ‘click of death’.

  • Buy an external USB/Firewire harddrive. 500GB drives are ridiculously cheap these days. Personally I don’t like back-up tapes due to experiences of them stretching and corrupting data.
  • Back-up images of the entire OS, not just the data. You can use Acronis TrueImage on Windows and SuperDuper on MacOSX. This can save you days restoring your entire development environment and applications from scratch.
  • Back-up individual files as well as entire OS images. You don’t want to have to restore a whole image to retrieve one critical file. Windows Vista and Mac OS X Leopard both have back-up applications built into the OS.
  • Use a separate machine to your development machine as source code server.
  • Use a RAID-1 (mirrored) disk on your main development machine[1]. It is worth noting that this actually doubles the likelihood of harddisk failure, but makes the likelihood of a catastrophic failure much lower. Keep an identical 3rd drive on hand to swap in when a drive fails.
  • Back-ups aren’t much use if they get incinerated along with your office in a fire, so store copies off-site. For example you can:
  • Make sure any off-site copies are securely encypted, for example using Axcrypt.
  • Automate your back-ups as far as possible. Computers are much better at the dull repetitive stuff.
  • Test restoring data once in a while. There is not much point backing up data only to find you can’t restore it when needed.

There are lots of applications for backing up individual files. So many in fact, that no-one has any hope of evaluating them all (marketing tip: don’t write another back-up application – really). I also worry that data stored in their various proprietary formats might not be accessible in future due to the vendor going out of business. I find the venerable DOS xcopy adequate for my needs. I run it in a scheduled Windows batch file to automatically synch file changes on to my usb harddrive (i:) every night. Here it is in all its glory:

XCOPY c:\data i:\data /d /i /s /v /f /y /g /EXCLUDE:exclude.txt

The exclude.txt file is used to exclude subversion folders and intermediate compiler files:


Which of the above do I do? Pretty much all of them actually. At least I try, I haven’t yet automated the offsite backup. This may seem rather excessive, but it paid dividends last month when gremlins went on the rampage here in the Oryx Digital office. I had 2 harddrive failures in 2 weeks. The power supply+harddisk+network card on my old XP development machine failed then, while I was in the process of moving everything to my new Vista development machine, one of the RAID-1 disks on the new machine failed.

Things didn’t go quite according to plan though. The new RAID-1 box wouldn’t boot from either harddisk. I have no idea why.

raid1Also the last couple of weekly Acronis image back-ups had failed and I hadn’t done anything about it. I had recent back-ups of all the important data, but I faced a day or more reinstalling all the apps I had installed since the last successful image. It took several hours on the phone to Dell technical support and much crawling around on the floor before I could I get the new RAID-1 box to boot off one harddisk. I was then able to rebuild RAID-1 using the spare harddisk I had on standby for such an eventuality. Nothing was lost, apart from my sense of humour.

Dell offered to replace the defective harddisk under warranty, but I declined on the grounds that there is far too much valuable information on this disk (source code, digital certificate keys, customer details etc) for me to entrust it to any third party. Especially given that Dell reserve the right to refurbish the harddisk and send it to someone else. What if they forgot to wipe it? My experiences with courier companies also haven’t given me great confidence that the disk would reach Dell. And I didn’t want to receive a reburbished disk as a replacement. It just isn’t worth relying on a refurb given how cheap new harddisks are. So the harddisk has joined the back of the growing queue to see Mr Lump Hammer.

The availability of cheap harddisks and cheap bandwidth means that it has never been easier to backup your systems. No more fiddling with mag tapes. Of course it is possible that your harddisk will work perfectly until it becomes obselete, but I think it would be very unwise to assume that this will be the case. Don’t say I didn’t warn you…

Further reading:

What’s your backup strategy? (the prolific and always worth reading Jeff Atwood beats me to the punch)

[1] RAID-1 is built in to some Intel motherboards and is available as a relatively inexpensive extra from Dell. You may have to ask for it though – it wasn’t listed as a standard configuration option when I purchased my Dell Dimension 9200.

[2] Since I wrote this article I installed the latest version of JungleDisk on my Vista box. On the 3 occasions I have tried to use it it hung Vista to the point where I had to I had to cut the power in order to reboot. I have now uninstalled it.