In a nutshell: Data (File) Replication, Snapshots and Backup
The capacity of storage systems grows all the time, providing more space and longer retention time for all sorts of data, produced and processed by business.
Accumulating data brings questions about its safety. Protecting from storage failure and introducing mechanisms preventing data loss are one of the matters to be carefully planned.
There are various techniques you can choose to protect your data and storage. Here, we will get a general concept of the following:
- Data (File) replication
All of the above techniques are featured in Open-E software.
Data (File) Replication
Data replication is a service responsible for copying/duplicating files and folders from a primary storage (production) to a secondary storage (replica). File level replication is asynchronous, so files are first written to the primary storage and then, based on a defined schedule/time interval changes, are collected and replicated to the secondary storage unit. Because data is copied with latency, primary storage failure can cause data loss if most recent modifications were not transferred to the secondary storage.
File replication can be used to restore the content of production storage unit after failure. In practice, if the primary storage is damaged or completely destroyed, with no ability to restore its content, data can be transferred from the secondary storage (replica) once the primary unit is replaced or repaired.
The whole replication process can use different methods of transferring data from the source (primary storage system) to the destination (secondary storage system). Data can be replicated locally within the system or via local area network (LAN) or wide area network (WAN).
Snapshots can capture state of data at a particular point-in-time. Some snapshots are made instantly and the content of them is immediately available for applications that require read-only data. This includes a data protection application like backup service, replication service or reporting applications, which can rely on read-only data repository.
The most common methods of taking snapshots offered by storage systems include:
- Copy-On-Write (COW) – at the time when snapshot is initiated only metadata about the original data location is copied. After that, each time the data are being modified, the original block of information is first copied to snapshot area and then written to the original location.
- Redirect-On-Write separates the storage area when snapshot is initiated, then writes all changes to the data in the snapshot area, while the original is kept unchanged.
- Split-Mirror creates a snapshot of the whole storage entity, like volume or LUN, to same-sized entity. Basically, it creates an exact copy of the data. This has an important advantage over other types of snapshots as it makes the whole data set available offline. While Copy-On-Write and Redirect-On-Write snapshots can be taken instantaneously, Split-Mirror snapshot takes more time due to the size of data, which has to be duplicated.
Backup is a process of copying and archiving data. Information preserved by backup can be restored after a storage failure, or just referred back to historical information if necessary. It is important, that backup has at least one full copy of data repository which needs to be preserved. Further changes to the data might be captured by using different methods defined in the backup policy.
Types of backup:
- Full is an exact copy of data that should be preserved.
- Differential contains changes since the last full backup was done and requires having at least one full backup as the point of reference for data restoration.
- Incremental contains changes since the last backup was taken. Previous backup (full or incremental) is a point of reference. Important is that to fully restore the system, an initial full backup is required as well as a whole series of incremental backups taken over a period of time.
Backup offers a variety of destinations to preserve data:
- Tape drives/Tape libraries,
- Hard drive (USB, SATA, ATA),
- Network (iSCSI, FC)
As a source of data, backup can use a snapshot taken prior to the procedure. This will preserve/archive data from a certain point-in-time, and will interfere less with normal storage operations and have less impact on the production environment.
One of the purposes for using data backup is a Disaster Recovery.
Backup allows storing a full copy of data in an off-site facility. That allows restoring all systems and data from scratch in the event of total system failure or in case servers are physically destroyed. Backups can also be used in less critical environments, for example to restore files removed intentionally or accidentally by individual users.
Certain methods of protecting your data have their pros and cons, some of them are more expensive to implement than others. Choosing a particular method of data protection usually depends on the budget, as well as what RTO (Recovery Time Objective) and RPO (Recovery Point Objective) required by a company and defined in its BC (Business Continuity) or DR (Disaster Recovery) plan.
You can browse our how-to resources to learn how each technique operate in Open-E products: