Blogs

News

Favorite Posts

ctodx

Discussions, news and rants from the CTO of Developer Express, Julian M Bucknall

DevExpress Newsletter 8: Message from the CTO

     

My Message from the CTO in our eighth newsletter:

Back in 2007 Google released a study on the lifetimes of hard disk drives. They are obviously in a pretty unique position to do so since they run huge server farms that run the majority of the world's searches, emails, and other cloud-like services.

Although they didn't single out any particular manufacturer or brand as being better or worse than the norm, they did say that a new disk drive has about a 2% chance of failing in a year. Drives that are over 2 or 3 years old the rate is significantly higher, over 8%.

Let's take that lower probability, and run with it.

So, the disk in your newish laptop has a one in 50 chance of dying this year. Doesn't sound too bad, surely? That's like shuffling a pack of cards thoroughly, placing the pack on the table, cutting, turning over the top card and it being the ace of spades.

Taking it further: in our house we have several PCs, containing in all 8 newish disk drives spinning all the time. The probability that at least one of those 8 dying this year according to Google's probability is 15%.

One in 7. That's in between tossing three heads in a row with a coin or throwing a six on a single die. Suddenly it doesn't seem that remote any more. (If they were older, that becomes 1 chance in 2 that at least one drive would fail.)

So, when was the last time you backed up? Given that, like me, you're the free tech gopher for all your relatives' and friends' machines, when did they back up?

OK, so a message only tangentially related to development this time, but, as you might have guessed, it came up for me recently in the "tech gopher" sense. Ever since my wife lost a lot of data because of a crashed hard disk and I discovered the latest backup was a couple of months old, I've been fairly obsessive about backups (you would be too, if you'd heard the words in the Bucknall household that day). For friends and relatives, I've been trying to coerce them to using something like JungleDisk or Mozy so at least their documents would survive.

(By the way, for those who wonder how to calculate these probabilities, here's the math. If a given disk drive has a probability of 0.02 of failing this year, it has a probability of 0.98 of staying alive. The probability that all 8 drives in our house staying alive this year is 0.98^8 or about 0.85. So, the probability of at least one failing is 0.15 or 15%. For the 0.08 probability of dying per drive, it's a 0.92 probability that it would survive, so for 8 drives that's 0.92^8 or 0.51 for them all staying alive, or 49% that one will die this year. A coin toss. (I'll note that Google's paper gives 0.017 as the probability of a drive failing in the first year; I just rounded up to make the point.))

Published Aug 12 2009, 11:54 AM by Julian Bucknall (DevExpress)
Filed under:
Technorati tags: Newsletter
Bookmark and Share

Comments

 

Twitted by jrguay said:

Pingback from  Twitted by jrguay

August 12, 2009 4:23 PM
 

Tomas said:

Honestly, no hardrive of my friends and relatives had broken in past ten years. Also could be considered that the PCs are usually replaced each 5-7 years, but still I see the chance as quite low ... Make backups anyway!

August 13, 2009 12:22 PM
 

Ivan said:

Something is a little off in your probability counts.

2x8 = 16 not 15.

and 8x8 = 64, which makes it 1 in 1.56 chance one of your old drives will die this year, not 1 in 2.  

August 13, 2009 1:04 PM
 

Aaron said:

So how do you backup? Seriously... We do our backups here to removable hard drive. I bet the failure rate on those is a lot higher since they can be moved even setting down on a table a little too hard might shake a drive head. Granted, we use multiple removable drives so if one fails we have options. Plus 2 of our servers backup to each other giving a 2nd restore option. Tape drives are definitely out of the question, especially for home use. The pricing and reliability of them (in my experiance) just makes them not worth it.

So what's the best option for home use when you have, say, over 20 gig worth of photos? I've been using DVDs, but even then people say that they degrade after a period of time.. It's not like I use these files daily, so I would need long term storage...

August 13, 2009 1:29 PM
 

Jim said:

Please don't tell me you leave all your PCs on all the time even when not in use?

Dont give me that 'wears them out' bunkem - I've turned mine off every night and put them to sleep regularly with no issue - this is a computer not a 1970s Chevy.

Come on, get green!

August 13, 2009 1:31 PM
 

William Vaughn said:

In my experience, drives typically fail about 5 years into their service life or 90 seconds after their warranty expires. I always buy the 5-year warranty drives as the lifetime of the cheaper drives seems to be significantly shorter. Consider that replacement drives are often "remanufactured" and might not last out the last year of the warranty. Heat, use, vibration (induced into the cabinet by anything), high-G shock and more factors also shorten life. I speak from 35+ years in the business--lose the boot drive from your domain controller (as I have) only to discover that the Acronis backups are "corrupt" can wreck your month.

August 13, 2009 1:50 PM
 

Mark O'Neal said:

Nice article, Julian. I would just like to add to it the importance of auditing your backups. We have over 70 customers using the Mozy backup service. We monitor these backups for an additional service fee to our customers. Our monitoring has shown that on any given day, up to 10% of the backups are flawed in some way. Usually it is because the user has a dodgy internet connection. Sometimes, for unknown reasons, Mozy just stops sensing files that have changed and need backed up. In this latter case, the solution has always been to uninstall and reinstall the Mozy client.

My point is, no backup solution is perfect and you have to stay on top of yours and make sure that your backups contain what you expect.

August 13, 2009 2:10 PM
 

Mark Hatch said:

Great post and spot on. To get the disclaimer out of the way, I'm the VP of Software Development at Intronis and we provide on-line backup services (and use Dev Express too, which is how I found this post). But since we have so many disks spinning at any given time, I can tell you that those percentages are accurate. Even we do not rely on one of our array configurations, but backup our backups redundantly and in two data centers on opposite coasts. Disks are mechanical items that fail. It's not a question of IF, it's a question of WHEN. And any service that fluffs the question of a backup of their backups isn't really taking these events seriously. So make sure you ask that question.

So when the disk fails, as it will eventually, you need a backup. And we all (me included) are bad about manual backups. So setting an online backup to keep you safe is by far the best option in all failure scenarios. Especially all you coders ...I can't imagine finding out the last backup of my code was a month ago! Keep your data safe out there.

August 13, 2009 2:43 PM
 

Richard Thor said:

Backup is the first thing you should think of, and the last thing you actually do. I use an external HD to backup my projects (when I think of it), and usually just ZIP up the files.

However, a few weeks ago, I synced with my new HTC phone, and it randomly trashed my contacts. Guess what... no backups. So I am now using iDrive to save it all.

Why is it we tell others the best things to do, and leave our own data to the last?

- Richard.

August 13, 2009 3:04 PM
 

Bill said:

I suspect those failure rates are for drives in heavy use.  OUr company has 30 machines and they have all been replaced when they were at least 5 years old and the new machines are probably a year old now - there have been no HD failures.  However we have 3 servers and have had 2 drives fail in the past five years.  So, I don't really think those stats apply to PC's not used as servers.

August 13, 2009 3:10 PM
 

aruest said:

The problem with backups is that data from one hard drive is typically backed up to another hard drive which can also fail. This happened to me recently at home when one of the hard drives on our NAS failed (4X250 Gig). Luckily, we run the NAS as a RAID configuration and thus did not lose any data. Until we are able to move to solid state "drives", I will always feel less than secure doing backups to hard drives.

August 13, 2009 3:41 PM
 

R. Medeiros said:

Mark's last paragraph is very correct! I know people that thought that they had everything they needed to restore an application (like a SharePoint App) and they didn't, so the app could not be restored.

Or even that file that you needed so much, didn't make it on you backup!

So, please do as Mark said, make sure your backup contains what you expect, and if they do, try to restore them, as a test, to make sure they are alright.

August 13, 2009 3:46 PM
 

Julian Bucknall (DevExpress) said:

Mark: An excellent point, and a very interesting gotcha with Mozy that I didn't know about.

Cheers. Julian

August 13, 2009 4:59 PM
 

Julian Bucknall (DevExpress) said:

Ivan: I would reread the last paragraph of the post to understand how to calculate the probabilities.

Tomas: I've had two drives die in the past five years or so. Both of them on my wife's laptop. The first time I had a pretty recent backup, and I slapped myself on my back for my prescience. When the next one died, I was well and truly caught with my pants down. Now, why it's the drives in my wife's laptop that are prone to dying, I'm not going to speculate...

Aaron: I auto-backup everything on my main internal drives using Acronis TrueImage to an external drive (once a day). It's a gamble (what if the internal and external drive die at the same time?), but the failure possibility would be pretty remote (0.04%, according to Google's stats). I also backup all documents (incl. photos) to another PC and to the cloud (I use JungleDisk and Amazon S3). Ditto for my wife's data. (So all data files are on four different drives.) I used to use CD-ROMs and DVD ROMs but I couldn't be bothered with the disk swapping.

Jim: My desktop is also the household file and print server, so it's on all the time. Thinking of getting a WHS machine instead (smaller, less powerful, uses less juice).

William: loved the throwaway line about the drive dying just after the warranty expires. Been there with the broadband router I use...

Cheers, Julian

August 13, 2009 5:30 PM
 

Jean-Luc Praz said:

My last complete local backup was done an hour ago; online backup some time earlier in the day. Why ? The contents on my HD is what drives my business so I can't afford to lose it.

Not that I am immune to data loss, about 3 months my laptop's HD died away after 3 years of use. That is when I noticed that I had some important data on it: I didn't think so; until then.

HD drive failure is one part of the picture. What about stolen PCs ? About 5 years somebody broke into my office, gone the PC, the flat screen (not cheap back then), the iPaq and the digital camera. Thanks god the external drive, containing a fresh backup, was not at the same place. Since then I am also talking to all my clients about the backup issue.

I usually I start like this: "What would happen to your business if one day you find out that your PC or laptop is just gone ?"

August 13, 2009 6:15 PM
 

Sanjeev narayan said:

Good point to note, and yes I have turned quite a few aces in the last year with one partcular brand of hard-drive.

I am not sure google study covers Solid State Drives, what is the likelyhood of SSD's crushing.

August 13, 2009 8:35 PM
 

Julian Bucknall (DevExpress) said:

Sanjeev: since SSDs are solid state, no moving parts, they'd be like your motherboard crashing, or your memory going bad or something. The issue with SSDs is that they "wear out" (the flash memory can only be written to ~10,000 times) and there's quite a bit of circuitry present that tries to balance out the "wear" of the flash memory chips across the whole drive. For a drive that's being written to constantly, you'd find the total capacity slowly being reduced as the flash memory cells start to "wear out".

Cheers, Julian

August 13, 2009 11:16 PM
 

Footsoldier said:

Raid cards and extra hard drives are cheap as chips. I tend to run "App" and "Data" drives and make sure that all the software I need to rebuild the "App" drive is to hand on permanent media. If you cost up your time rebuilding than a RAID setup (even on a home PC) pays for itself in nothing flat.

As for being the tech gopher, try ringing a few friends at 9.30 p.m. now and again to ask for urgent help in their professional specialty. My buddy the spanish teacher eventually got the message after a lengthy discussion of castilian verb tenses late one night. (Don't try this with Health professionals - you'll find out things you never wanted to know....)

August 14, 2009 4:40 AM
 

Jesse McClusky said:

Can I just say this is the reason I *love* Windows Home Server?  I've been running one in my home with 5 drives for just over 2 years now (since the beta), and I've already gone through a seamless hard-drive failure, with no data loss.  WHS has to be far and above the best product MS has ever produced.

August 14, 2009 12:28 PM
 

Donn Edwards said:

I use *HDTune* on my laptop to prevent the drive from overheating. I have lost an entire drive because of a motherboard fault that caused it to overheat. I literally burnt my fingers.

I use *Spinrite* every 3-4 months to check the status of each drive and test the surface. It has prevented several disasters.

Finally, I use a USB external drive for backups and archives. And I use *Acronis TrueImage* to make a complete disk image of my laptop, so if I'm struck by a virus or drive failure I can get my apps, registry settings and data back.

I've had too many close calls to NOT do backups, and with laptop theft a common occurrence, I can't afford to lose all my data.

August 14, 2009 4:22 PM
 

Corey Kosak said:

I think the most sensible "should I back up?" metric is not the probability of loss, but rather the expected value of the loss.  All my work is checked into source control (elsewhere), so for me the expected loss is the last few hours' work, plus the cost of an OS and app reinstall, times two percent, which makes it acceptable to do no additional backups.  However, when I tell my mom that in the coming year, there is a 2% chance that she will irretrievably lose all of her precious photos, documents, and records, that same 2% number doesn't feel "low" to her at all.

August 15, 2009 8:05 AM
 

Bernd Maierhofer said:

I am too one of the lonely fighters for the necessity of complete, frequent and automated backups. Having that said, I want to pay your attention on the fact, that the given statistics may be incorrent. As the probability of one disk failing is independent of anothers disk failing, you must not multiply probabilities but adding them. So "The probability that at least one of those 8 dying this year according to Google's probability is 15%" must read "is 2%". But of course disk have kind of paradise at Google´s - nochanging temparature, climate control, ..., nothing we have at home.

Cheers Bernd

August 18, 2009 3:15 AM
 

Algorithms for the masses - julian m bucknall said:

I dashed off a quick " Message from the CTO " about disk drives failure probabilities and backups for the eighth DevExpress newsletter , and, for such a quick message, it really resonated with the customers. It's always the way: it seems the

August 19, 2009 10:33 PM
 

Julian Bucknall (DevExpress) said:

Bernd: I think you're mistaken there. Agreed the probabilities are independent, but you must multiply them.

Consider the analogy with dice. With one die, the chance of getting a 6 is 1/6. So the chance of NOT getting a 6 is 5/6. If I have two dice, the chance of NOT getting a six at all is 5/6 * 5/6, or 25/36. The other 11/36 of the probability is taken up with a six appearing on one (or both, of course) of the dice.

Cheers, Julian

August 21, 2009 6:57 PM

About Julian Bucknall (DevExpress)

Julian is the Chief Technology Officer at Developer Express. You can reach him directly at julianb@devexpress.com. You can also follow him on Twitter with the ID JMBucknall.
More from DevExpress
Live Chat
Have a pre-sales question?
Need assistance with your evaluation?
We are here to help.
Chat is one of the many ways you can contact members of the DevExpress Team. We are available Monday-Friday between 8:30am and 5:00pm Pacific Time.
If you need additional product information, require pre-sales assistance, or want help with your order, write to us at info@devexpress.com or call us at
+1 (818) 844-3383.