RAID 5: How does it work?

Updated 23/09/2021

A few years ago, we published an article ‘How does RAID 5 work? The shortest and easiest explanation ever’. We have to admit that this article enjoyed great popularity among our readers. That’s why we would like to explain this matter once again, but this time we also considered our readers’ questions.

RAID (redundant array of independent disks; originally redundant array of inexpensive disks) is a data storage virtualization technology that combines multiple disk drive components into a logical unit for the purposes of data redundancy or performance improvement.

Source: Wikipedia

RAID 5 explained

RAID 5 requires a minimum of 3 drives to implement. Each data block is written on a data disk, and parity for blocks in the same rank is generated on Writes and checked on Reads.

In order to understand RAID 5, you must know XOR:

The binary XOR (exclusive OR) operation has two inputs and one output. The inputs to a binary XOR operation can only be 0 or 1 and the result can only be 0 or 1.

XOR function result is equal 1 if both arguments are different.

XOR (0, 1) = 1
XOR (1, 0) = 1

XOR function output is equal 0 if both arguments are the same.

XOR (0, 0) = 0
XOR (1, 1) = 0

Now let us assume we have 3 drives with the following bits:

| 101 | 010 | 011 |

We calculate XOR of those data and place it on the 4^th drive.

XOR (101, 010, 011) = 100 (XOR (101,010) = 111 and then XOR (111, 011) = 100

So the data on the four drives looks like this:

| 101 | 010 | 011 | 100 |

In case of using more than 3 drives, you must use the same calculation for XOR:

XOR (0, 0) = 0
XOR (0, 1) = 1
XOR (1, 0) = 1
XOR (1, 1) = 0

This time, lets assume we are using four drives with the following data:

1,1,0,1
1,0,0,0
0,0,1,0
0,1,1,0

We calculate XOR of those data and place it on the 5^th drive.

XOR (1,1,0,1) = 1
XOR (1,0,0,0) = 1
XOR (0,0,1,1) = 1
XOR (o,1,1,0) = 0

So the data looks like this:

1,1,0,1,1
1,0,0,0,1
0,0,1,0,1
0,1,1,0,0

And now let’s simulate a break of the 3^rd drive:

1,1,x,1,1
1,0,x,0,1
0,0,x,0,1
0,1,x,0,0

In order to get data from this drive, calculate XOR for x from 1^st, 2^nd, 4^th, and 5^th drive:

x for the 1^st from XOR (1,1,1,1) = 0 / why? 1,1=0 -> 0,1=1 -> 1,1=0
x for the 2^nd line from XOR (1,0,0,1) = 0
x for the 3^rd line from XOR (0,0,0,1) = 1
x for the 4^th line from XOR (0,1,0,0) = 1

You have to always calculate the sum of two numbers and the result with the next number. Action is alternate for XOR.

Let’s assume the second drive has failed.

When we calculate XOR all the remaining data will be present from the missing drive.

| 101 | 010 | 011 | 100 |

XOR (101, 011, 100) = 010

You can check the other missing drives and XOR of the remaining data, which will always give you the exact data of your missing drive.

| 101 | 010 | 011 | 100 |

XOR (101, 010, 100) = 011

What works for 3 bits and 4 drives only, works for any number of bits and any number of drives. Real RAID 5 has the most common stripe size of 64k (65,536 * 8 = 524,288 bits).

The real XOR engine only needs to deal with 524,288 bits and not 3 bits as in our exercise. This is why RAID 5 needs a very efficient XOR engine in order to calculate it fast.

When adding one drive for parity, you will be able to rebuild the missing data in case of any drive failure.

In our example, we have explained RAID 4 where parity is on a dedicated drive. The only difference between RAID 4 and RAID 5 is that RAID 5 distributes parities evenly between all drives and RAID 4 keeps parities in a dedicated drive. Distributed parity provides a slight increase in performance but the XOR magic is the same.

A more detailed description of what Redundant Array of Independent Disks means you can find in the newest article by Grzegorz Walasek: The Fundamentals of RAID