Drobo Drive Failure: Lesson Learned? Again?

Well, my Drobo is up and running again.  All my data appears in tact.  Wonderful!  Now we go merrily along like nothing happened, right? Not quite.

Today, I have the backup bug.  The backup bug is a wonderful thing.  Simply put, I have the incredible urge to back everything up.  Make multiple copies of everything I don’t want to lose and securely distribute those copies out into the universe where, one day, I might call on them to resurrect themselves.  Yeah!

I also know  something about the backup bug: it will go away quickly.  Today my data are safe, and they’ll probably be safe tomorrow.  I can put it off for a couple days while I resolve other priorities.  Days will become weeks and weeks will become an unknown amount of time until my next hard drive failure.  It will come.

When that hard drive failure comes again, and it will, I’ll be kicking myself, fixing the problem and hopefully moving on with minimal data damage, but with a huge loss of time.  While I probably will have minimal loses, it’s still a waste.  With 2 TB hard drives getting down to as little as $100 these days, it’s cheaper to buy a bunch and consistently backup rather than lose a couple days fixing the problem.  It’s a hard lesson to learn.

What really kills off my backup bug is organization.  I have the space to properly backup, particularly today.  However, the question because what and how to backup.  I don’t want to backup everything, just those things that are not adequately backed up.  I have several categories for my data:

1. Active  – Active data are those things I’m actively modifying.  Writing projects, for example.  These require constant backup.

2. Inactive – Formerly active projects that are retired.  I may want them on my computer or server, but they only need multiple static copies.

3. Configuration – those files that are used to configure my computer – the OS, for example.  These, like Active projects, should be constantly backed up but not necessarily in the same place.

4. Replaceable – if the files are not really mine, like an application disk image, and can be reacquired generally shouldn’t be backed up, unless they are no longer available.

5. Temporary – judgement call – once it’s backed up, it might be around for a really long time.

In any case, none of my files are organized this well… I should probably get to it if I’m going to get anything before the backup bug goes away.

Another drive failure in my Drobo

Drive failures are a fact of life.  We just have to get used to it and be prepared.  I have a stack of failed drives in my basement, destined for destruction and eventual recycling.  Losing a drive is never happy, is often painful, and is quite disruptive.

One of the most difficult things in computing, particularly home computing, is being prepared for drive (or computer) failures. Let’s face it: it’s expensive and hard.  There are those that wisely suggest the 3-2-1 approach to data protection: 3 copies of everything, 2 media formats (like a hard drive and a DVD), and 1 copy offsite.  Who can really do that? Well, most people really.  For a typical computer user, you probably have less than 500 gb of data, and probably far less “important” data.  With that amount of data, you could easily satisfy the 3 copies and 1 offsite requirement.   The 2nd media format could also include online storage in the cloud (yes, that counts as offsite too!), which is getting more reasonable all the time.

For me, however, I have tons of data.  In fact, I have so much data that the 3-2-1 approach is particularly difficult for me to meet.  In fact, I’m building a database just to track hard drives!  I probably have somewhere around 9-20 terabytes of drive space floating about.  The question is “How much of that is sufficiently backed up?”  Probably too little of it.  I’d say less than 1% meets the 3-2-1 criteria.  However, I’d say have at least half at the 2-1-1 level.

For what doesn’t even meet the 2-1-1 level, I have my Drobo.  A Drobo is a bit like a RAID, but requires less knowledge to run and is a bit more flexible.  When it’s in good condition, it protects your data from a drive failure.  For me, it serves 3 basic rolls: “temporary” storage of active projects, “archival” storage for things I have backups elsewhere, and the “pit of despair” for all the data that came to either die and be forgotten or data I just never got around to backing up.  The problem, of course, is that the Drobo somewhat tricks you to think that you don’t need to back up your data just yet. You still have to back it up.  Do as I say! (not as I do)

This past Sunday, when trying to get busy, my Drobo Dashboard app started flashing an angry Red flash.  Drive failure.  Remember, all is okay with my data, the Drobo protected it.  However, now all the data are in an unsafe state – the Drobo can no longer ensure the safety of my data until I replace the failed drive.   When drives fail, my goal is simply to buy a larger capacity drive and replace it.  Unfortunately, this failed drive was the largest in my Drobo.  Gak!  So, no upgrade is likely to happen.   Worse, I’m not in a position to run out and get a new drive – I had to order one and the shipment is delayed because of weather.  Gak!

So, I’m offloading data from my Drobo.  Right now, since the largest drive failed, I’m really REALLY over the safe capacity of the Drobo, so I need to get the overall storage level down to meet the new limitations.  That’s about 1 terabyte of data I need to offload.  Gak!   Arguably, I don’t have to do this at all.  Instead I could just wait for the drive to arrive and plop it in.  However, the Drobo’s performance is really really bad in this state.  Furthermore, all those “temporary” and “pit of despair” files are all at risk.  Sigh.

I should point out that I think this might actually be my 3rd drive failure in my Drobo.  I don’t think that’s unusual, after all, over the years I’ve had many many drives fail.  After all, I run them pretty hard.  It just reinforces the need for backups and redundant drive solutions.

Once everything is good, it’ll be time to check on those warrantees.