Your harddrive *will* fail – it’s just a question of when

failed harddisksThere are a few certainties in life: death, taxes and harddisk failure. I have no less than 6 failed harddisks sitting here on my desk patiently awaiting their appointment with Mr Lump Hammer. 2 Seagates, 3 Maxtors and 1 Western Digital. This equates to roughly one disk failure per year. Perhaps this is not suprising given that I have about 9 working harddisks at the moment spread across various machines. Given the incredible tolerances to which harddisks are manfactured, perhaps it is a miracle harddisks work at all.

As an analogy, a magnetic head slider flying over a disk surface with a flying height of 25 nm with a relative speed of 20 meters/second is equivalent to an aircraft flying at a physical spacing of 0.2 µm at 900 kilometers/hour. This is what a disk drive experiences during its operation. –Magnetic Storage Systems Beyond 2000, George C. Hadjipanayis from Wikipedia

We all know we need to back-up our data. But it is a chore that often gets forgotten at the most critical periods. Here are my hints for preparing yourself for that inevitable ‘click of death’.

  • Buy an external USB/Firewire harddrive. 500GB drives are ridiculously cheap these days. Personally I don’t like back-up tapes due to experiences of them stretching and corrupting data.
  • Back-up images of the entire OS, not just the data. You can use Acronis TrueImage on Windows and SuperDuper on MacOSX. This can save you days restoring your entire development environment and applications from scratch.
  • Back-up individual files as well as entire OS images. You don’t want to have to restore a whole image to retrieve one critical file. Windows Vista and Mac OS X Leopard both have back-up applications built into the OS.
  • Use a separate machine to your development machine as source code server.
  • Use a RAID-1 (mirrored) disk on your main development machine[1]. It is worth noting that this actually doubles the likelihood of harddisk failure, but makes the likelihood of a catastrophic failure much lower. Keep an identical 3rd drive on hand to swap in when a drive fails.
  • Back-ups aren’t much use if they get incinerated along with your office in a fire, so store copies off-site. For example you can:
  • Make sure any off-site copies are securely encypted, for example using Axcrypt.
  • Automate your back-ups as far as possible. Computers are much better at the dull repetitive stuff.
  • Test restoring data once in a while. There is not much point backing up data only to find you can’t restore it when needed.

There are lots of applications for backing up individual files. So many in fact, that no-one has any hope of evaluating them all (marketing tip: don’t write another back-up application – really). I also worry that data stored in their various proprietary formats might not be accessible in future due to the vendor going out of business. I find the venerable DOS xcopy adequate for my needs. I run it in a scheduled Windows batch file to automatically synch file changes on to my usb harddrive (i:) every night. Here it is in all its glory:

XCOPY c:\data i:\data /d /i /s /v /f /y /g /EXCLUDE:exclude.txt

The exclude.txt file is used to exclude subversion folders and intermediate compiler files:

\.svn\
.obj
.ilk
.ncb
.pdb
.bak>

Which of the above do I do? Pretty much all of them actually. At least I try, I haven’t yet automated the offsite backup. This may seem rather excessive, but it paid dividends last month when gremlins went on the rampage here in the Oryx Digital office. I had 2 harddrive failures in 2 weeks. The power supply+harddisk+network card on my old XP development machine failed then, while I was in the process of moving everything to my new Vista development machine, one of the RAID-1 disks on the new machine failed.

Things didn’t go quite according to plan though. The new RAID-1 box wouldn’t boot from either harddisk. I have no idea why.

raid1Also the last couple of weekly Acronis image back-ups had failed and I hadn’t done anything about it. I had recent back-ups of all the important data, but I faced a day or more reinstalling all the apps I had installed since the last successful image. It took several hours on the phone to Dell technical support and much crawling around on the floor before I could I get the new RAID-1 box to boot off one harddisk. I was then able to rebuild RAID-1 using the spare harddisk I had on standby for such an eventuality. Nothing was lost, apart from my sense of humour.

Dell offered to replace the defective harddisk under warranty, but I declined on the grounds that there is far too much valuable information on this disk (source code, digital certificate keys, customer details etc) for me to entrust it to any third party. Especially given that Dell reserve the right to refurbish the harddisk and send it to someone else. What if they forgot to wipe it? My experiences with courier companies also haven’t given me great confidence that the disk would reach Dell. And I didn’t want to receive a reburbished disk as a replacement. It just isn’t worth relying on a refurb given how cheap new harddisks are. So the harddisk has joined the back of the growing queue to see Mr Lump Hammer.

The availability of cheap harddisks and cheap bandwidth means that it has never been easier to backup your systems. No more fiddling with mag tapes. Of course it is possible that your harddisk will work perfectly until it becomes obselete, but I think it would be very unwise to assume that this will be the case. Don’t say I didn’t warn you…

Further reading:

What’s your backup strategy? (the prolific and always worth reading Jeff Atwood beats me to the punch)

[1] RAID-1 is built in to some Intel motherboards and is available as a relatively inexpensive extra from Dell. You may have to ask for it though – it wasn’t listed as a standard configuration option when I purchased my Dell Dimension 9200.

[2] Since I wrote this article I installed the latest version of JungleDisk on my Vista box. On the 3 occasions I have tried to use it it hung Vista to the point where I had to I had to cut the power in order to reboot. I have now uninstalled it.

60 thoughts on “Your harddrive *will* fail – it’s just a question of when

  1. Phil

    Wow what are you doing to these poor hard disks? On the other end of the probability scale, I have not had a single hard drive fail in 12 years, knock on magnets. (Including my two servers each with 3 hds, my 2 laptops, 4 pcs, 3 macs and one backup drive I’ve had in that time).

  2. WR

    Xcopy is ok, expecially with your indicated switches, but have you tried robocopy? It comes bundled with the freely downloadable server 2003 resource kit. I’m a huge fan.

    -WR

  3. tndal

    I’ve never had a drive fail. I’ve always retired them in good condition. The average lifespan prior to retirement has been 4 years. Seagate, Maxtor, Western Digital: all the same. I fail (to wait) well before they do!-))

  4. TN

    Interesting that you noted having a majority of failed Maxtors. Perhaps not surprisingly, that has also been my experience. I’ve been fortunate enough to evade HDD failures for the most part, and the ONLY brand name of hard drive I’ve ever had fail on me was Maxtor (3 hard drives in the last few years).

    One of these drives hadn’t been backed up in quite some time, and important data had just been added to it. As a tip to fellow readers, there is an application called GetDataBack from Runtime Software that I was able to use to restore the entire contents of that disk. It cost about $79 at the time. I was able to run the software free to verify what could be restored and once I could see it would work, I paid for the license. Maybe give it a shot on some of your drives?

    Nice tips, and thanks for the suggestions on software. I’ve been using SyncBack free version, but I’m seeking other solutions as well. I like it a lot but UAC in Vista seems to prevent the backup from running on occasion. I’m also only running backups on data and really need to look into a solution for imaging–preferably something versatile, lightweight, and doesn’t require a full-blown installed application. I’ll probably look into BartPE or something as I think that might do what I need it to. I used DriveImage back in the day and loved it because I could re-image my entire PC in less than 5 minutes, and it didn’t require me to have software installed since it was all run from a bootable CD. I did buy Ghost not long ago but it was far too bulky for me.

  5. Andy

    You might want to look into how clean your electricity supply is, as this is supposed to have an impact on hard drive reliability. If you aren’t already running from a UPS or filter, it might be a good time to look into buying one.

  6. Smartass

    Encrypt your harddrives. Not at a file or image level. Encrypt the entire drives. Then you can have drives replaced or even stolen without worrying about your data. There’s no good reason not to. That many OSes don’t have good support for encryption out of the box is a true shame.

  7. John

    That’s an impressive xcopy command you have there. Have a look at xxcopy, it copies files which are open, much easier as you can just let it go and then carry on with your work.

  8. Andy Brice Post author

    I have started using truecrypt to encrypt all the sensitive data on my laptop. I am not sure about the performance penalty for encrypting the entire disk on my development machine – any data?

  9. MarcoVincenzo

    I too have to wonder what you’re doing to the poor things. In the last 17 years I have had 4 bad hard drives, 2 Maxtors (late 90s and they really were junk–but I still will never again buy a Maxtor), 1 Western Digital 120 failed at around 5.5 years old, and one Western Digital 400 arrived from Newegg DOA–it’s replacement is working just fine 18 months later. That’s 4 “bad” drives in 17 years out of maybe 70+ total drives.

    On my home machines I’ve got 8 250GB in RAID 5 on my server, 5 400GB in my XP development machine, 3 1TB in my desktop machine, 3 74GB Raptors and 2 36GB Raptors as system drives–all Western Digital. 21 drives currently active and no problems in the last 5 years except for the DOA. Hard drives do fail, usually at the worst possible time, but you’re having way too many failures. I’d suggest looking for causes other than the drives themselves.

  10. Andy Brice Post author

    It is true that I have had quite a few failures. But they failed in various different machines at 2 different locations (some are from my old job), so it is hard to see a common cause. I don’t live in a dusty area next to a magnet factory. ;0)

    I do have surge protectors in the blocks I plug the PCs into. I am thinking about getting a UPS/filter as well.

  11. LabThug

    I see the defective drives in your image are Seagate’s ST3500630AS. These drives have been utter trash for me (check my blog). I bought five in July and have already had *six* replacements. While I wish you the best at recovering your data, I will warn you that it’s going to be a painfully long road until you get something from them that’s trustable. It’s a shame though, I used to consider them the Gold Standard in HD quality. The mighty have certainly fallen.

    Good luck!!!

  12. Stefan Zeiger

    If you have a 5.25″ bay and a SATA port to spare in one of your machines, I suggest getting a trayless SATA rack like the RaidSonic IB-169SK-B that I’m using instead of external USB or Firewire drives. While “raw” disk drives need to be handled more carefully than an enclosed external drive, I still find the handling simpler. Just push the drive in and close the door instead of connecting both a data cable and a power cable. You also avoid the overhead of USB and FW (and possible data corruption which I’ve seen on more than one machine when transferring several hundred GB over USB or FW!). And it scales easily because you can just add more SATA drives.

    You suggested to back up individual files because “you don’t want to have to restore a whole image to retrieve one critical file”. While backup for a small number of files may be faster that way, you can also restore single files from a disk image very quickly (at least with Acronis True Image). You can even mount the image and just copy out whatever files you need.

  13. she

    so far i had only two hdd fail.
    i used about 30, so 28 are still working (although i discarded some older ones, like a 8 gig hdd… ancient times man…)

    i agree on external hdd’s though
    they are cheap and more reliable than DVD or HDVD or whatever the DRM infected crap is called these days

  14. Pingback: The PC Weenies. V2.0: Tech ‘Toons for Tech Enthusiasts. Squared. » Archive » Tuesday plugs…

  15. Ilya Sitnikov

    My own observation is that computers live for 5-6 years on average and hard drives live for about 3 years or so.

    Maybe its different for newer computers/hard drives, but it used to be that way.

    :-)

  16. sansoucy

    Thanks for this, my hard drive just failed on my emachine, and unfortunately I learned the hard way about backing up all of my material. :-(

  17. Stephane Grenier

    Absolutely. I’ve got more than a dozen failed drives floating around my office, all of which failed within the last 4-5 years.

    But the good news, knock on wood, I haven’t really lost any critical data! Like you say, it really pays to prepare because it’s not a matter of if, it’s a matter of when.

    And I absolutely agree about mirrored Raid. 100% of my disks are mirrored, on all machines. Even test boxes. The reasoning is that the time it takes to bring back a machine to life from a disk failure is more than the cost of the additional drive required to create the mirrored Raid!

    And because you asked, here’s a link to my own backup strategy (which pretty similar to yours): http://www.followsteph.com/2006/07/03/4-simple-steps-to-protect-your-data-from-999999-of-all-computer-failures/

  18. Ilya Sitnikov

    Then there’s also that thing with corrupted Windows profiles, where all your settings are lost.

    It can be fixed somehow and it can’t be compared to hard drive failure, but its still unpleasant.

  19. Charlie

    You asked about a full-disk encryption product- After a mandate was handed down by the State, here at UT Health Science in Houston our sysadmins settled on SafeBoot. I do a -lot- of 3D modeling & game developing on this machine, as well as .Net development, and was really concerned about response times.

    Our grunt systems guy had a tough time getting some of the Gateway laptops running after converting to the product (including mine), but eventually, all 500+ Dell / Gateways made it. After using this for about a year now, I can’t say I notice much of a slowdown, if any, while working, and haven’t had any other problems.

    I have Zero experience with any other products, but I can say that SafeBoot has worked here.

  20. Pingback: links for 2008-02-06 « Donghai Ma

  21. Pingback: links for 2008-02-06 | Lazycoder

  22. kger

    Boris, a good-sounding open source encryption utility is FreeOTFE [http://www.freeotfe.org/]. It’s feature list says, “Encrypted volumes may either be file, partition, or even disk based”.

  23. h

    – had a deathstar go at 2 years, in the deathstar days (dell replaced).
    have had 2 to 3 pcs running near 24 hours for last few years. so far no failing noises.

    – erratically i run a batch, compressed file backups to usb drive. most of what i have i’ll likely never look at again, but that’s true of most ppl’s data. :-)

    – i’ve restored a few things like app settings, otherwise i’ve verified hardly any archive files.

    – i like the idea of sending to friend (mutual thing), but no one i know bothers backing up.

    – backups i began at work were valuable, when computer failed months later. naturally, bosses never notice employee prescience …

    – i hate when good parts go ewasters, but don’t know why anyone would want your certifiable dead drives…

  24. rubinelli

    I hadn’t had any issues with hard drives until last year, when I had two failures. I had backups of all the important files due to the constant prodding of my wife, who thankfully never shared my faith in technology. I know extrapolating from my experience isn’t logically defensible, but I have the feeling manufacturers are trading reliability for capacity and speed nowadays. I bet most of your hard drives from the 90’s were retired due to size, not failure.

    Solid-state drives will become the norm in a few years. They have Moore’s Law on their side.

  25. mCw

    Yes; though none of my hard-drives have died completely yet, I _always_ back up at least all of my code, to a USB flash drive. It takes only a few seconds, and I know that if my HDD were ever to fail and all my code be lost, I would _die_.

    I should probably be writing to a floppy disk as well, but I don’t bother…

  26. bayoujim

    In my 38 years of working with computers (mainframes to pc) I have seen something I must speak about. I can tell you for a fact that the harddrives built today are being built with a calculated, limited, life span.

    I noticed this change a few years ago. Before that, harddrive failure was unheard of, it was the last thing one thought about when something went wrong with the pc, they used to last much longer than the rest of the computer.

    Today I get about 2 years before harddrive failure.

  27. Darkstriker

    Very good article indeed. Maybe I should have listened to somthing like this about a year ago when first one of my raid0 drivesbroke (as in physically one of the sata connectors just came off with part of the hdd) and then my other maxtor had bad sectors all over about 3 month later. I do, however, think that it is very important to appreciate, that most failures are due to the incompetence of the producer. Maxtor as well as seagate seem to make very low quiality hdds compared to hq producers such as Western Digital. I have had 10 maxtors & seagates (a mix of 5:5) and all have failed me with 1 year of buing them. I have currently 12 WDs running of which 1 is 7 years, 2 are 5 years, 6 are 4 years and 5 are 3 years old and they never failed me, i never had even one bad sector and especially regarding the 7 year old one that is an incredible performace realizing that it was overheating on a regular basis (ca 10 times a day) for 2 years until I installed a drive cooler.

    thanks for the good stuff

  28. GFYM

    No wonder 3 of them was Maxtor.
    Got my self 2 brand new Maxtors that crashed after few days.
    Never more Maxtor.

  29. Doug Rosbury

    Mechanical systems or components are all too subject to flaws and wear and who knows what. why in a “high tech” system such as a computer
    would anyone design into it a failure prone component such as a mechanical hard drive(???) How about eliminating “hard drives” and finding a true high tech solution. (Huh???) How about it Dell or
    any of you techies out there(???) Please, just do it(!!!).
    PLEASE)>DOUG ROSBURY

  30. Andy Brice Post author

    Perhaps part of the problem is that HDs are so cheap. This must put a lot of pressure on manufacturers to cut corners. Personally I would happily pay double for a HD that was significantly and demonstrably more reliable.

  31. Steve

    To askbusinesscoach

    I have never used SpinRite but stumbled upon very critical evaluations, sometimes calling it downright useless. You seem to have some experience with it, but “could take a few days” doesn’t sound it would be useful for any emergency.

  32. Dave

    I’ve had this laptop for almost two years now, and it’s still running nicely. I’ve reformatted it three times now. Going from the start… a 386 for two years, a P2 for six years, and a P4 for three, which went from WinME to WinME to WinXP, reformat in between all… never had a failed hard drive. You must physically abuse them.

  33. Evan

    Out of 20-some computers with HDDs, in the last 5 10 years, all owned by me, I’ve retired all but two of the machines at this point and in all that time I’ve only had two HDD failures. Both were Seagate drives. I don’t know if it’s just bad luck with the Seagates, but I can tell you I’m not exactly rushing out to buy more of them.

  34. Evab

    Doug Rosbury (above) will be getting his wish!

    Hard Disks will only be with us for a very few more years. Less than 10 more years for sure. Already, solid-state alternatives are available and are much faster and much more reliable. The only problem right now is PRICE. But, the price is already coming down and within 10 years it will be cheaper to have an entirely solid state, non volatile long-term storage device replace all hard drives.

    At first, we’ll still have seperate fast RAM and slightly slower long-term storage, but as the years continue to fly by, we will see some major changes, petrobyte memories where RAM and longterm storage are one in the same.

    Hard drives are ALMOST a thing of the past now, just hardly anybody knows it yet…

  35. Eric

    Excellent post. I recently “inherited” a network which has been running for a good five years only using tape backups (I have created some critical backups elsewhere) I was wondering if anyone has had any experience using online backup sources(Bandwidth is cheap!). Currently I have an offsite 7-day tape system, picked up weekly but was curious if anyone has even thought of moving to 3rd party/researched online backups and cost comparison to hard media.

  36. TrueXRT

    Excellent post. I have worked with many, many drives over the year and backups are critical. Out of my systems here is my history with drives, these are both personal and production use:
    Brand Owned Failed
    Maxtor 36 3
    Seagate 49 7
    Western D 28 21
    Samsung 6 1
    Hitatchi 2 0
    Quantum 3 0
    Toshiba 1 1

    HDs in production use I retire after 4-5 years.

  37. J. Taylor

    I’ve yet to have a failure either … 2 of my boxes are over 6 years old, and one is around 4 now … are you going offroading while transporting your computers?

    –J. Taylor

  38. Pingback: Successful Software

  39. BC SEO

    I have had 3 failures, in 10 years.
    Backup daily, is my mantra. I even have a bright pink sticker taped to my monitor, beside the power button that says BACK UP.

  40. Donna Barron

    How do you know if you really need an image backup? Consider how much you value your data, how much time (and money, if you have to pay someone else to do it) it will take to rebuild a new drive if yours fails and how inconvenienced you will be until your computer is up and running again. With external backup drives becoming increasingly affordable, even home computer users can now enjoy the security that drive to drive backup software can provide.

    Restoring from an image backup eliminates all the work and time normally associated with rebuilding a new drive. When you restore the image backup to your new drive, not only all of your data files but all of your applications, your preferences, your latest hardware drivers and even your drive partitions are all transferred to your new drive. If you are restoring a boot drive image, the new drive will be bootable as well. The actual time involved in transferring the data to the new drive will depend on the size and speed of the drives.

    ____________
    Donna Barron is communications director for Data Protection Solutions by Arco, publisher of EzBackup and EzMigration software. Download a free trial of either software: http://www.arcoide.com

    drive backup technology products

    http://www.ezd2d.com/ezbackup_and_restore.phpdrive backup technology products

  41. Pingback: “Think you can’t get a virus by visiting a web page? Think again!” « Successful Software

  42. messed-up

    Had one bad harddisk in 15 years, wich was a DOA. About 50 harddisks from all manufacturers, including quite a lot Maxtors, have been working fine untill i replaced them by newer bigger and faster ones.
    I have two 160 GB PATA Maxtors running 24/7 downloading from P2P and usenet, not one bad sector. None of my internal drives wich run 24/7 did fail. Cheap external drives often have a bad powersupply, bad controller and contain cheap harddisk versions. if you relly want an external one, buy a disk and a separate ecxternal case from a major brand, not the cheap rubbish.
    I really don`t understand what the hell people do with their harddisks to make them fail. Bad powersupply, trowing them against the wall ?

  43. Pingback: Harddisk woes « Successful Software

  44. Pingback: Dropbox is saving my bacon: MicroISV on a Shoestring

  45. online bacfkup

    It’s an excellent post! online backup has became one of the most important part to backup your critical business information.

  46. Pingback: How good are your backups? « Successful Software

  47. JohnnyBoyClub

    Well he has a point the hdd will fail but you can longer the life of your hdd by using different programs and also keep it aware of viruses that might harm your hdd.

    Also to keep you data safe you should allways use a backup software to keep your important data safe and secured. A good software for that matter is : http://www.dmailer.com/dmailer-backup.html ,is a free software that you can use and also save your data online on their servers.

  48. Pingback: 10 things non-technical users don’t understand about your software « Successful Software

  49. Pingback: I’m a millionaire! « Successful Software

Comments are closed.