DevExpress Newsletter 8: Message from the CTO

12 August 2009

My Message from the CTO in our eighth newsletter:

Back in 2007 Google released a study on the lifetimes of hard disk drives. They are obviously in a pretty unique position to do so since they run huge server farms that run the majority of the world's searches, emails, and other cloud-like services.

Although they didn't single out any particular manufacturer or brand as being better or worse than the norm, they did say that a new disk drive has about a 2% chance of failing in a year. Drives that are over 2 or 3 years old the rate is significantly higher, over 8%.

Let's take that lower probability, and run with it.

So, the disk in your newish laptop has a one in 50 chance of dying this year. Doesn't sound too bad, surely? That's like shuffling a pack of cards thoroughly, placing the pack on the table, cutting, turning over the top card and it being the ace of spades.

Taking it further: in our house we have several PCs, containing in all 8 newish disk drives spinning all the time. The probability that at least one of those 8 dying this year according to Google's probability is 15%.

One in 7. That's in between tossing three heads in a row with a coin or throwing a six on a single die. Suddenly it doesn't seem that remote any more. (If they were older, that becomes 1 chance in 2 that at least one drive would fail.)

So, when was the last time you backed up? Given that, like me, you're the free tech gopher for all your relatives' and friends' machines, when did they back up?

OK, so a message only tangentially related to development this time, but, as you might have guessed, it came up for me recently in the "tech gopher" sense. Ever since my wife lost a lot of data because of a crashed hard disk and I discovered the latest backup was a couple of months old, I've been fairly obsessive about backups (you would be too, if you'd heard the words in the Bucknall household that day). For friends and relatives, I've been trying to coerce them to using something like JungleDisk or Mozy so at least their documents would survive.

(By the way, for those who wonder how to calculate these probabilities, here's the math. If a given disk drive has a probability of 0.02 of failing this year, it has a probability of 0.98 of staying alive. The probability that all 8 drives in our house staying alive this year is 0.98^8 or about 0.85. So, the probability of at least one failing is 0.15 or 15%. For the 0.08 probability of dying per drive, it's a 0.92 probability that it would survive, so for 8 drives that's 0.92^8 or 0.51 for them all staying alive, or 49% that one will die this year. A coin toss. (I'll note that Google's paper gives 0.017 as the probability of a drive failing in the first year; I just rounded up to make the point.))

24 comment(s)
Twitted by jrguay

Pingback from  Twitted by jrguay

12 August, 2009

Honestly, no hardrive of my friends and relatives had broken in past ten years. Also could be considered that the PCs are usually replaced each 5-7 years, but still I see the chance as quite low ... Make backups anyway!

13 August, 2009

Something is a little off in your probability counts.

2x8 = 16 not 15.

and 8x8 = 64, which makes it 1 in 1.56 chance one of your old drives will die this year, not 1 in 2.  

13 August, 2009

So how do you backup? Seriously... We do our backups here to removable hard drive. I bet the failure rate on those is a lot higher since they can be moved even setting down on a table a little too hard might shake a drive head. Granted, we use multiple removable drives so if one fails we have options. Plus 2 of our servers backup to each other giving a 2nd restore option. Tape drives are definitely out of the question, especially for home use. The pricing and reliability of them (in my experiance) just makes them not worth it.

So what's the best option for home use when you have, say, over 20 gig worth of photos? I've been using DVDs, but even then people say that they degrade after a period of time.. It's not like I use these files daily, so I would need long term storage...

13 August, 2009

Please don't tell me you leave all your PCs on all the time even when not in use?

Dont give me that 'wears them out' bunkem - I've turned mine off every night and put them to sleep regularly with no issue - this is a computer not a 1970s Chevy.

Come on, get green!

13 August, 2009
William Vaughn

In my experience, drives typically fail about 5 years into their service life or 90 seconds after their warranty expires. I always buy the 5-year warranty drives as the lifetime of the cheaper drives seems to be significantly shorter. Consider that replacement drives are often "remanufactured" and might not last out the last year of the warranty. Heat, use, vibration (induced into the cabinet by anything), high-G shock and more factors also shorten life. I speak from 35+ years in the business--lose the boot drive from your domain controller (as I have) only to discover that the Acronis backups are "corrupt" can wreck your month.

13 August, 2009
Mark O'Neal

Nice article, Julian. I would just like to add to it the importance of auditing your backups. We have over 70 customers using the Mozy backup service. We monitor these backups for an additional service fee to our customers. Our monitoring has shown that on any given day, up to 10% of the backups are flawed in some way. Usually it is because the user has a dodgy internet connection. Sometimes, for unknown reasons, Mozy just stops sensing files that have changed and need backed up. In this latter case, the solution has always been to uninstall and reinstall the Mozy client.

My point is, no backup solution is perfect and you have to stay on top of yours and make sure that your backups contain what you expect.

13 August, 2009
Mark Hatch

Great post and spot on. To get the disclaimer out of the way, I'm the VP of Software Development at Intronis and we provide on-line backup services (and use Dev Express too, which is how I found this post). But since we have so many disks spinning at any given time, I can tell you that those percentages are accurate. Even we do not rely on one of our array configurations, but backup our backups redundantly and in two data centers on opposite coasts. Disks are mechanical items that fail. It's not a question of IF, it's a question of WHEN. And any service that fluffs the question of a backup of their backups isn't really taking these events seriously. So make sure you ask that question.

So when the disk fails, as it will eventually, you need a backup. And we all (me included) are bad about manual backups. So setting an online backup to keep you safe is by far the best option in all failure scenarios. Especially all you coders ...I can't imagine finding out the last backup of my code was a month ago! Keep your data safe out there.

13 August, 2009
Richard Thor

Backup is the first thing you should think of, and the last thing you actually do. I use an external HD to backup my projects (when I think of it), and usually just ZIP up the files.

However, a few weeks ago, I synced with my new HTC phone, and it randomly trashed my contacts. Guess what... no backups. So I am now using iDrive to save it all.

Why is it we tell others the best things to do, and leave our own data to the last?

- Richard.

13 August, 2009

I suspect those failure rates are for drives in heavy use.  OUr company has 30 machines and they have all been replaced when they were at least 5 years old and the new machines are probably a year old now - there have been no HD failures.  However we have 3 servers and have had 2 drives fail in the past five years.  So, I don't really think those stats apply to PC's not used as servers.

13 August, 2009

The problem with backups is that data from one hard drive is typically backed up to another hard drive which can also fail. This happened to me recently at home when one of the hard drives on our NAS failed (4X250 Gig). Luckily, we run the NAS as a RAID configuration and thus did not lose any data. Until we are able to move to solid state "drives", I will always feel less than secure doing backups to hard drives.

13 August, 2009
R. Medeiros

Mark's last paragraph is very correct! I know people that thought that they had everything they needed to restore an application (like a SharePoint App) and they didn't, so the app could not be restored.

Or even that file that you needed so much, didn't make it on you backup!

So, please do as Mark said, make sure your backup contains what you expect, and if they do, try to restore them, as a test, to make sure they are alright.

13 August, 2009
Julian Bucknall (DevExpress)

Mark: An excellent point, and a very interesting gotcha with Mozy that I didn't know about.

Cheers. Julian

13 August, 2009
Julian Bucknall (DevExpress)

Ivan: I would reread the last paragraph of the post to understand how to calculate the probabilities.

Tomas: I've had two drives die in the past five years or so. Both of them on my wife's laptop. The first time I had a pretty recent backup, and I slapped myself on my back for my prescience. When the next one died, I was well and truly caught with my pants down. Now, why it's the drives in my wife's laptop that are prone to dying, I'm not going to speculate...

Aaron: I auto-backup everything on my main internal drives using Acronis TrueImage to an external drive (once a day). It's a gamble (what if the internal and external drive die at the same time?), but the failure possibility would be pretty remote (0.04%, according to Google's stats). I also backup all documents (incl. photos) to another PC and to the cloud (I use JungleDisk and Amazon S3). Ditto for my wife's data. (So all data files are on four different drives.) I used to use CD-ROMs and DVD ROMs but I couldn't be bothered with the disk swapping.

Jim: My desktop is also the household file and print server, so it's on all the time. Thinking of getting a WHS machine instead (smaller, less powerful, uses less juice).

William: loved the throwaway line about the drive dying just after the warranty expires. Been there with the broadband router I use...

Cheers, Julian

13 August, 2009
Jean-Luc Praz

My last complete local backup was done an hour ago; online backup some time earlier in the day. Why ? The contents on my HD is what drives my business so I can't afford to lose it.

Not that I am immune to data loss, about 3 months my laptop's HD died away after 3 years of use. That is when I noticed that I had some important data on it: I didn't think so; until then.

HD drive failure is one part of the picture. What about stolen PCs ? About 5 years somebody broke into my office, gone the PC, the flat screen (not cheap back then), the iPaq and the digital camera. Thanks god the external drive, containing a fresh backup, was not at the same place. Since then I am also talking to all my clients about the backup issue.

I usually I start like this: "What would happen to your business if one day you find out that your PC or laptop is just gone ?"

13 August, 2009
Venu M

Good point to note, and yes I have turned quite a few aces in the last year with one partcular brand of hard-drive.

I am not sure google study covers Solid State Drives, what is the likelyhood of SSD's crushing.

13 August, 2009
Julian Bucknall (DevExpress)

Sanjeev: since SSDs are solid state, no moving parts, they'd be like your motherboard crashing, or your memory going bad or something. The issue with SSDs is that they "wear out" (the flash memory can only be written to ~10,000 times) and there's quite a bit of circuitry present that tries to balance out the "wear" of the flash memory chips across the whole drive. For a drive that's being written to constantly, you'd find the total capacity slowly being reduced as the flash memory cells start to "wear out".

Cheers, Julian

13 August, 2009

Raid cards and extra hard drives are cheap as chips. I tend to run "App" and "Data" drives and make sure that all the software I need to rebuild the "App" drive is to hand on permanent media. If you cost up your time rebuilding than a RAID setup (even on a home PC) pays for itself in nothing flat.

As for being the tech gopher, try ringing a few friends at 9.30 p.m. now and again to ask for urgent help in their professional specialty. My buddy the spanish teacher eventually got the message after a lengthy discussion of castilian verb tenses late one night. (Don't try this with Health professionals - you'll find out things you never wanted to know....)

14 August, 2009
Jesse McClusky

Can I just say this is the reason I *love* Windows Home Server?  I've been running one in my home with 5 drives for just over 2 years now (since the beta), and I've already gone through a seamless hard-drive failure, with no data loss.  WHS has to be far and above the best product MS has ever produced.

14 August, 2009
Donn Edwards

I use *HDTune* on my laptop to prevent the drive from overheating. I have lost an entire drive because of a motherboard fault that caused it to overheat. I literally burnt my fingers.

I use *Spinrite* every 3-4 months to check the status of each drive and test the surface. It has prevented several disasters.

Finally, I use a USB external drive for backups and archives. And I use *Acronis TrueImage* to make a complete disk image of my laptop, so if I'm struck by a virus or drive failure I can get my apps, registry settings and data back.

I've had too many close calls to NOT do backups, and with laptop theft a common occurrence, I can't afford to lose all my data.

14 August, 2009
Corey Kosak

I think the most sensible "should I back up?" metric is not the probability of loss, but rather the expected value of the loss.  All my work is checked into source control (elsewhere), so for me the expected loss is the last few hours' work, plus the cost of an OS and app reinstall, times two percent, which makes it acceptable to do no additional backups.  However, when I tell my mom that in the coming year, there is a 2% chance that she will irretrievably lose all of her precious photos, documents, and records, that same 2% number doesn't feel "low" to her at all.

15 August, 2009
Bernd Maierhofer

I am too one of the lonely fighters for the necessity of complete, frequent and automated backups. Having that said, I want to pay your attention on the fact, that the given statistics may be incorrent. As the probability of one disk failing is independent of anothers disk failing, you must not multiply probabilities but adding them. So "The probability that at least one of those 8 dying this year according to Google's probability is 15%" must read "is 2%". But of course disk have kind of paradise at Google´s - nochanging temparature, climate control, ..., nothing we have at home.

Cheers Bernd

18 August, 2009
Algorithms for the masses - julian m bucknall

I dashed off a quick " Message from the CTO " about disk drives failure probabilities and backups for the eighth DevExpress newsletter , and, for such a quick message, it really resonated with the customers. It's always the way: it seems the

19 August, 2009
Julian Bucknall (DevExpress)

Bernd: I think you're mistaken there. Agreed the probabilities are independent, but you must multiply them.

Consider the analogy with dice. With one die, the chance of getting a 6 is 1/6. So the chance of NOT getting a 6 is 5/6. If I have two dice, the chance of NOT getting a six at all is 5/6 * 5/6, or 25/36. The other 11/36 of the probability is taken up with a six appearing on one (or both, of course) of the dice.

Cheers, Julian

21 August, 2009

Please login or register to post comments.