BANGALORE, INDIA: At a time when digital data over the metro is doubling in every 18 months, data deduplicatiom has emerged as the latest buzz in the storage industry. Data deduplication or dedupe removes redundant data over the metro.
"Redundant data is identified using hashing (the process of identifying unique segments of blocks) on the client, or in larger environments, a local backup can be taken and only unique data sent to central site. The latter method is efficient because restore from recent backups is local, not over the WAN. So it's much quicker to recover than if it had to be carried over the WAN," says Vivek Anand, regional director, India & SAARC, CommVault, in an interview to CIOL, where he shares his views on deduplication. Excerpts:
CIOL: Is space optimization through data dedupe the same in all circumstances?
Vivek Anand: Categorically not. Data deduplication comes in many forms and should not be seen only as a gain to be had at the end point (or dedupe appliance). If you consider the lifespan of data it typically begins at a client and finally comes to rest in a long-term store - be that local tape, vaulted tape or even an on-line archive.
Before the data finishes its journey it is stored and held for variable periods of time in different locations and on different media to satisfy recovery or access criteria (defined by regulatory requirements or business practice). If we isolate deduplication to an appliance then we only gain the benefit of deduplication at a single point and for a limited period of time, we also miss the opportunity for the rationalization of data where it makes sense (i.e. before it is transmitted over a network, stored on a device other than a dedupe appliance etc.).
Also, not all data is the same and optimization ratios are different based on the data type. For example, video and image data is less likely to retain similar blocks than document and message data and as such will have a detrimental effect on dedupe ratios.
Customers also have a mixture of redundant and new data. Redundant data provides excellent deduplication ratios whereas "net new" data reduces the overall ratio. The point is that customers are not the same in terms of overall requirement, data held and data requirements in longer term.
Finally, when moving deduplicated data from disk media to tape to take advantage of long-term costs benefits, CommVault is the only vendor to migrate data "as is" without the need to reverse the deduplication process and result in additional storage requirements.
CIOL: What are the challenges in this space?
VA: The challenges come in many forms. Firstly, customers need to understand the net benefits of deduplication. It may be that a standard non-deduplication solution may be of more value. A number of the customers with whom we've met in India have not made the decision to migrate to backup or archive to disk in the first instance.
As such deduplication will be of limited value to them except potentially where remote data is being serviced - in which case CommVault would advise on mixed solutions.