Distributed Compression with Random Access


The amount of data generated in many applications such as astronomy and genomics has highlighted the growing need for compression schemes that allows to interact and manipulate data directly in the compressed domain. Classical compression schemes such as Lempel-Ziv are suboptimal in this regard since the recovery of even a single message symbol necessitates decompressing the entire source. In this talk, we will explore the problem of providing random access (also called local decodability or locality) in the compressed domain, where short fragments of data can be recovered without accessing the entire compressed sequence. It is known that lossless compression of a random source can be performed with a notion of strong locality at rates close to entropy; any individual source symbol can be decoded from a constant (independent of the length of the source sequence, n) number of compressed bits, with a vanishing in n probability of error. In contrast with the single source setup, we show that for two separately encoded correlated sources (a.k.a. the Slepian-Wolf problem), lossless compression and strong locality is generally not possible. More precisely, we show that for the class of "confusable'' sources strong locality cannot be achieved whenever one of the sources is compressed below its entropy. In this case, irrespectively of n, the probability of error of decoding any symbol is lower bounded by 2^{-O(d)}, where d denotes the number of compressed bits accessed by the local decoder. Conversely, if the source is not confusable, strong locality is possible even if one of the sources is compressed below its entropy. This is based on joint work with Aslan Tchamkerten and Venkat Chandar.

The Speaker