People are a bit surprised every time they hear this question from us.
Conventional wisdom about Hot-Spares teaches us that it is a very nice idea: minimizing degraded array state, etc.
So, why is using a Hot-Spare Drive a bad idea?
It’s true that a Hot-Spare helps to minimize the duration of a degraded array state but our goal of creating a Redundant Array of Inexpensive Disks is to continue operation and not to lose data in the event of a drive failure. Anything that increases the risk of data loss is a bad idea.
Based on our long years of experience we have learned that during a RAID rebuild the probability of an additional drive failure is quite high – a rebuild is stressful on the existing drives. This is why we advise following the procedure once the array shows a degraded state as a result of a drive failure.
- Run a full data backup.
- Verify the backed-up data for consistency, and verify whether the data restore mechanism works.
- Identify the problem source, i.e. find the erroneous hard disk. If possible, shut down the server, and make sure the serial number of the hard disk matches that reported by the RAID controller.
- Replace the hard disk identified as bad with a new, unused one. If the replacement hard drive had already been used within another RAID array, make sure that any residual RAID metadata on it has been deleted via the original RAID controller.
- Start the rebuild of the RAID.
So using this approach, the rebuild is the 5th step! By using a Hot-Spare your RAID will skip the first two very important steps and then run steps 3, 4 and 5 automatically. Thus the rebuild will be done before these other critical steps that work to ensure that your data is safe.
Being aware of Murphy’s Law, no one would risk an immediate rebuild after a drive failure – but by using a Hot-Spare this is exactly what will happen. If you stop and think about the integrity of your data, you will come to the same conclusion: a Hot-Spare Drive is a very bad idea.