Among many technologies that increase data storage protection, the term ‘replication’ is used very frequently.
When you use block or file-based replication, a mirrored copy of the data is created in a location different from the parent data. In case of file-based replication there is an option to keep deleted files on the destination location. Also, a limited versioning can be achieved with file-based replication. It can be done with multiple replication tasks by copying the source data into a few different locations. For example, every Monday the replication task will copy the data to the first sub-directory, Tuesday to the second, etc. With such setup, 7 days behind will be preserved but the disadvantage is significantly much higher storage space required. BTW, this may be fixed with deduplication.
Depending on the software, the target location can be usually found on the remote computer, on a portable disk connected to the local computer or, in case of file-based replication, just a directory (e.g. a mounted NFS or SMB).
Now, let’s analyze three days from the life of an administrator who wants to perform a backup through replication.
During data replication all data is being synchronized. In the situation described above, the administrator did not find a copy of the deleted files from the source server, because once they were deleted by Mrs. Christie, the absence of those files was also replicated to the target server. In other words, in case of block-based replication , the deletion of the files from the source server also resulted in the removal from the target server. Moreover, in case of block replication, any errors that occur in the hardware layer (e.g. damaged disk) can result in damage of the file system and are replicated to the backup server. In the end, you can accidentally become an owner of two damaged and useless mirrored copies. This situation also applies to the file-based replication, which in case of hardware layer or file system errors means replication of the damaged files. Let’s take a look at the next note from the diary of our unlucky administrator.
Yes, replication by itself does not perform data versioning, but just simply copies the changes from one place to another. Multiple file versioning is the domain of the traditional backup or snapshot technology. It can be relatively easy to achieve replicated data versioning if we combine replication with one of the technologies (or both). For example, when snapshots are performed within a one-hour interval, it is possible to restore your data to the state of the required time. On the other hand, when we use backup combined with replication or/and snapshots, the leeway tends to expand even more.
And that is how the story of our administrator ends. Of course, it has been presented in an oversimplified way. Pleased with a ‘happy end’ of this story, I shall move on to some conclusions.
Replication can be compared (assuming a constant synchronization) to a kind of RAID 1 network. In case of a failure of one of the replicated servers – replication can protect us from data loss. However, it is not a backup. Nonetheless, we can ‘arm’ the replicated data during backup performance. Thanks to this, we gain fully operable copies of the data from the server and a mirror copy of all data on two servers.
In future articles you will be able to read about using simple file-based replication with file versioning.