A Designing Robust Algorithms for Data Streams that Might be Adversarially Generated

# 200

Abstract

Modern data stream processing systems handle massive amounts of data that arrive continuously, requiring quick and extremely space efficient processing of each data item. The results of data analysis performed by such systems can drive decision making that affects future data streams fed to the same systems. Dealing with this feedback loop presents new challenges. The emerging area of adversarially robust streaming algorithms addresses this challenge; such algorithms are designed to produce correct outputs, even when each stream element can be chosen as an arbitrary (and possibly adversarial) function of earlier algorithm outputs. In this talk, we will outline basic results on adversarially robust algorithms to provide some necessary background and set the stage. This will lead up to two recent sets of results of ours; one on the Missing Item Finding (MIF) problem, and the other on graph coloring. In MIF, the algorithmic task is to determine items in a (massive) universe that are missing from the given input stream. Here, we prove the optimum space requirement depends strongly on what source of random bits the algorithm can access. In graph coloring, the task is to assign as few distinct colors as possible to the vertices of an input graph (specified as a stream of edges) so that no edge becomes monochromatic. We prove that, similar to the situation for MIF, the color budget achievable by a streaming algorithm depends strongly on the source of randomness available to the algorithm. Based on joint work with Prantar Ghosh and Manuel Stoeckl.

Amit Chakrabarti is a Professor in the Department of Computer Science at Dartmouth College. He holds a Ph.D. in Computer Science from Princeton University and a B.Tech. in Computer Science from the Indian Institute of Technology, Bombay, where he was the President of India Gold Medalist. Professor Chakrabarti's research is in the broad area of theoretical computer science. Specific interests are (1) complexity theory, especially communication complexity and applications of information theory, and (2) algorithms, including space-efficient algorithms for massive data streams and approximation techniques for optimization problems. He is particularly known as a pioneer of the concept of information complexity; this work was awarded a 20-year Test of Time Award by the IEEE Syposium on Foundations of Computer Science (FOCS). His research has been funded by several awards from the US National Science Foundation, including a CAREER award.