KVS Ramarao's blog

Patent # 8,078,593: Dictionary Architecture and Methodology for Revision-tolerant Data De-duplication

I must confess - I was quite surprised when this deduplication (“dedupe”) related patent was granted barely two years after we filed it! That too in a field littered with patent applications – every other company seems to have filed one (and in some cases, many more).

Rabin algorithm – the mother of deduplication technologies

Michael Rabin’s 1981 paper simply says the following in its abstract:

  • “Randomly chosen irreducible polynomials p(t) ε Z2[t] are used to “fingerprint” bit-strings. This method is applied to produce a very simple real-time string matching algorithm and a procedure for securing files against unauthorized changes. The method is provably efficient and highly reliable for every input.”

It amazes me even now that this seemingly theoretical piece of work based on irreducible polynomials has come to become the basis for much of the present-day deduplication technology. Virtually every aspect of the vastly popular “chunking” algorithm and its many variants have their roots in this paper. Let’s take a quick glance at the unadorned “chunking” algorithm before proceeding further.

Pages

Subscribe to RSS - KVS Ramarao's blog
Content