All IT departments prefer to back up to disk versus tape. Data deduplication enables the cost-effective use of disk because it reduces the amount of disk required by only storing unique bytes or blocks from backup to backup. Over an average backup retention period, deduplication will use about 1/10th to 1/50th of the disk capacity, depending on the mix of data types. On average, the deduplication ratio is 20:1.
All vendors need to offer data deduplication in order to reduce the amount of disk to lower the cost to be about the same as tape. However, how deduplication is implemented changes everything about backup. Data deduplication reduces the amount of storage and also the amount of data replicated, saving costs in storage and bandwidth. However, if not implemented correctly, deduplication will create three new compute problems that greatly impact backup performance (backup window), restores and VM boots, and whether the backup window stays fixed or grows as data grows.
The traditional approach deduplicates backups “inline,” or during the backup process. Deduplication is compute intensive and inherently slows backups, resulting in a longer backup window. Some vendors put software on the backup servers in order to use additional compute to help keep up, but this steals compute from the backup environment. If you calculate the published ingest performance and rate that against the specified full backup size, the products with inline deduplication cannot keep up with themselves. All of the deduplication in the backup applications are inline, and all the large brand deduplication appliances also use the inline approach. All of these products slow down backups, resulting in a longer backup window.
In addition, if deduplication occurs inline, then all of the data on the disk is deduplicated and needs to be put back together, or “rehydrated,” for every request. This means that local restores, instant VM recoveries, audit copies, tape copies and all other requests will take hours to days. Most environments need VM boot times of single-digit minutes; however, with a pool of deduplicated data, a VM boot can take hours due to the time it takes to rehydrate the data. All of the deduplication in the backup applications as well as the large-brand deduplication appliances store only deduplicated data. All of these products are very slow for restores, offsite tape copies, and VM boots.
Furthermore, these solutions employ a scale-up architecture with a front-end controller and disk shelves. As data grows, only disk shelves are added, which expands the backup window until the backup window becomes too long and the front-end controller needs to be replaced with a bigger, faster, and more expensive front-end controller, called a “forklift upgrade.” All of the backup applications and large-brand deduplication appliances use the scale-up approach whether in software or in a hardware appliance. With all of these solutions, as the data grows, the backup window does as well.
ExaGrid has implemented deduplication in a way that provides the best possible reduction of data while still maintaining the highest possible backup performance. Each ExaGrid appliance has a unique landing zone where backups can land straight to disk without any inline processing, so backups are fast and the backup window is short. ExaGrid is typically 3X faster for backup ingest. Deduplication and offsite replication occur in parallel with the backups for a strong RPO (recovery point) and never impede the backup process as they are always second order priority. ExaGrid calls this “adaptive deduplication.”
Since backups write directly into the landing zone, the most recent backups are in their full undeduplicated form ready for any restore request. Local restores, instant VM recoveries, audit copies, tape copies, and all other requests do not require rehydration and are as fast disk. As an example, instant VM recoveries occur in seconds to minutes versus hours when using the inline deduplication approach.
ExaGrid provides full appliances (processor, memory, bandwidth, and disk) in a scale-out GRID. As data grows, all resources are added including additional landing zone, bandwidth, processor, and memory as well as disk capacity. This keeps the backup window fixed in length regardless of data growth, which eliminates expensive forklift upgrades. Unlike the inline, scale-up approach where you need to guess at which sized front-end controller is required, the ExaGrid approach allows you to simply pay as you grow by adding the appropriate sized appliances as your data grows. ExaGrid offers eight appliance models, and any size appliance or any age appliance can be mixed and matched in a single GRID, which allows for IT departments to buy compute and capacity as they need it. This approach also eliminates product obsolescence.
When architecting its appliances, ExaGrid thought through the implementation of data deduplication as it relates to backup and designed an optimized architecture that provides for the fastest backups, restores, recoveries, and tape copies; permanently fixes the backup window length, even as data volumes grow; and eliminates forklift upgrades and product obsolescence, all while allowing IT staff the flexibility to buy what they need as they need it. ExaGrid’s appliances deliver three times the backup performance, five to ten times the restore and VM boot performance and a backup windows that stays fixed in length as data grows.