Data deduplication enables the cost-effective use of disk as it greatly reduces the amount of disk required by only storing unique bytes or blocks from backup to backup. Over an average backup retention period, deduplication will use about 1/10th to 1/50th of the disk, depending on the mix of data types. On average, the deduplication ratio is 20:1.
All vendors need to offer data deduplication in order to reduce the amount of disk to lower the cost to be about the same as tape. However, how deduplication is implemented changes everything about backup. Data deduplication reduces the amount of storage and also the amount of data replicated, saving storage and bandwidth costs; however, if not implemented correctly, it will create three new compute problems that greatly impact backup performance (backup window), restores, and VM boots and whether the backup window will stay fixed in length or grow as data grows.
Deduplication in backup software is typically performed on the client or agent, on the media server, or both.
The deduplication ratio for most backup software is on average 2:1 to 8:1, much lower than hardware appliances (20:1), as the hardware is not dedicated to deduplication and therefore the software vendors typically employ deduplication algorithms that are less aggressive. Deduplication in backup software, depending on the vendor, delivers deduplication ratios of 2:1, 3:1, 4:1, 6:1 and possibly as high as 8:1. This means that anywhere from 2.5 to 8X the storage is required to store the same retention periods as a dedicated appliance. The lower deduplication ratio implementations will also use a lot more WAN bandwidth. At 3 to 4 weeks of retention, the amount of storage and bandwidth will probably work; however, if you are keeping many weeks, months, and years of retention, the cost of storage and bandwidth using deduplication in the backup software is far too expensive. In addition, deduplication in the backup software deduplicates the backups inline during the backup process. Deduplication is a compute-intensive process and slows backups down, which results in a longer backup window. Furthermore, if deduplication occurs inline, then all the data on the disk is deduplicated and needs to be put back together, or “rehydrated,” for every request. Local restores, instant VM recoveries, audit copies, tape copies, and all other requests take hours to days. Furthermore, these solutions only add disk as data grows. Since additional compute resources are not added, as data grows, the backup window expands until the backup window becomes too long and then the media server has to be upgraded to a bigger, faster, and more expensive media server.
ExaGrid understands that deduplication is required, but how you implement it changes everything in backup. ExaGrid has a unique landing zone where backups can land straight to disk without any inline processing. Backups are fast and the backup window is short. ExaGrid is typically 3X faster for backup ingest. Deduplication and offsite replication occur in parallel with the backups for a strong RPO (recovery point). Deduplication and offsite replication occur in parallel with the backups by using available unused resources. Deduplication and replication never impede the backup process as they always are second order priority. ExaGrid calls this, “adaptive deduplication.” Since backups write directly to the landing zone, the most recent backups are in their full undeduplicated form ready for any request. Local restores, instant VM recoveries, audit copies, tape copies, and all other requests do not require rehydration and are as fast as disk. As an example, instant VM recoveries occur in seconds to minutes versus hours for the inline deduplication approach. ExaGrid provides full appliances (processor, memory, bandwidth, and disk) in a scale-out system. As data grows, all resources are added, including additional landing zone, bandwidth, processor, and memory as well as disk capacity. The backup window stays fixed in length regardless of data growth, which eliminates expensive server upgrades. Unlike the inline, scale-up approach where you need to guess at how much server hardware and storage is required, the ExaGrid approach allows you to simply pay as you grow by adding the appropriate sized appliances as your data grows. ExaGrid has eight appliance models and any size or age appliance can be mixed and matched in a single system, which allows IT departments to buy compute and capacity as they need it. This evergreen approach also eliminates product obsolescence.
ExaGrid thought through data deduplication implementation and created an architecture that provides for the fastest backups, restores, recoveries and tape copies; fixed the backup window as data grows; and eliminated forklift upgrades and obsolescence, while allowing IT staff to buy what they need as they need it. There is no downside and only upside. ExaGrid has taken the stress out of backup storage with 3X the backup performance, up to 20X the restore and VM boot performance, and a backup window that stays fixed in length as data grows.