After RAID 0 and RAID 1 (with RAID 1+0 and RAID 0+1) it is time for RAID 2, 3 and 4. Here we present a small description of the mentioned levels. We hope it will outline a picture of the functionality of these solutions. Although the article is a kind of history lesson – these solutions are no longer used – it is good to be aware of the origins of modern storage technologies.
RAID 2 – the bit-level striping with dedicated Hamming-code parity
In the case of RAID 2 all data are stripped (to the bit levels – not block). Each bit is written on a different drive/stripe. Such a solution requires the use of Hamming code for error correction.
Hamming code is a linear error-correcting code named after its inventor, Richard Hamming. Hamming codes can detect up to d – 1 bit errors, and correct (d – 1) / 2 bit errors, where d is the minimum hamming distance between all pairs in the code words; thus, reliable communication is possible when the Hamming distance between the transmitted and received bit patterns is less than or equal to d. By contrast, the simple parity code cannot correct errors, and can detect only an odd number of errors.
The number of discs in RAID 2 used to store information is equal to the logarithm of the number of discs that are protecting the mentioned data. All disks in RAID 2 work as one disk which has a capacity equal to the common capacity of all disks used to store data.
While RAID 2 is being used it is important to synchronize all disks. Such a solution requires that the controller, which makes disks, will spin at the same angular orientation – in other way the index will not be reached at the same time. Disintegration will lead to total uselessness of drives in array.
Such a requirement is not the only drawback. Also the need for long Hamming code generation may prove to be problematic by slowing the whole system down.
The mode of RAID 2 action may be hard to understand. The need for using Hamming code, special controllers for disks – it makes RAID 2 a not very popular solution. But if we think about it in a less pragmatic way, it may prove to be very interesting – mainly due to its modus operandi. It introduces many more complex solutions than RAID 0 and RAID 1. While everything works well, RAID 2 proves to be quite a good solution in area of data security. In case of HDD failure – no matter if it was the disk with data or the Hamming code – any part of the array may be reconstructed by the other disks used.
While it is interesting and it has its advantages, we have not heard about any commercial implementations of RAID 2. Solutions based on it were used only in the initial phase of RAID systems usage – before disks were equipped with their own correction code. Modern HDDs use various correction and optymalising algorithms. That is why the Hamming system has started to be less interesting in the area of professional usage and it is no longer implemented in modern controllers.
RAID 3 – another rare one in practice
RAID 3 works as RAID 0 does – it uses byte-level stripping – but it also uses an additional disk in the array. It is used to store checksums and it supports a special processor in parity codes calculating – so we may call it “the parity disk”.
In RAID 3, configuration data are divided into individual bytes and then saved on a disk. Parity byte is determined for each row of data and saved on the mentioned “parity disk”. In case of failure it allows to recover data by an appropriate calculation of the remaining bytes and parity bytes that correspond with them.
Although RAID 3 is rarely used in practice, it is worth pointing out its advantages. First of all is its resistance to damage of one disk in the arrangement. Secondly, high read speed. Unfortunately, it also has a couple of drawbacks.
The read speed is more than satisfactory but write speed is on the contrary – the reason being the necessity of checksums calculating (even RAID hardware controllers cannot solve this problem). She second disadvantage is a matter of disk failure. When it happens, the whole system will work much slower. What is more, although RAID 3 is resistant to breakdown (in case of failure of one disk in the array), replacing a damaged disk is very costly. A third problem is the disk used for calculating checksums – it is usually the bottleneck in the performance of the entire array.
As can be easily seen, RAID 3 is not a good, reliable or cheap solution. Therefore, as it was mentioned earlier, its use is rare in practice. Systems based on RAID 3 are mostly purposed for implementations where a small number of users refer to the very large files.
RAID 4 – smells like RAID 3 and 5
RAID 4 is very similar to RAID 3. The main difference is the way of sharing data. They are divided in to blocks (16, 32, 64 lub 128 kB) and written on disk s – similar to RAID 0. For each row of written data, any recorded block is written on a parity disk. In short this means that RAID 4 does not strip data at block levels but it uses byte levels for striping (block-level striping with a dedicated parity disk).
There are also similarities in relation to RAID 5, but it confines all parity data to a single drive. RAID 4 does not use distributed parity.
RAID 4 requires at least three disks for complete implementation and configuration. What is more, it also needs hardware support for parity calculations. This makes it possible to recover data by the appropriate mathematical operations.
If we asked: what is RAID 4 for? we would point out one particular need. Such a solution will work very well in the case of really large files – when sequential read and write data process is used. Using RAID 4 for small portions of data would be not a good idea. The reason is the need to carry out modifications of parity blocks for each I/O session. The need for continuous repeating of such an operation would cause large losses of time and slow down a whole system.