Sequential Clustering of Data Streams from Unknown Distributions
# 173
Abstract
In this talk, we will consider the problem of clustering S data streams into K clusters based on the proximity of the underlying distributions (unknown) generating the data streams. Each data stream is a stream of independent and identically distributed samples from an unknown distribution. We focus on the sequential setting where a new set of samples from the data streams are provided at each time step. The proposed sequential tests are universal in the sense that they are independent of the underlying configuration of the distribution clusters, and the distributions themselves. We propose sequential nonparametric clustering tests for two cases: number of clusters known or unknown. In both cases, we show that the proposed sequential nonparametric clustering tests stop in finite time almost surely and are universally exponentially consistent. Simulations show that the proposed sequential clustering tests outperform the corresponding fixed sample size tests in terms of the expected number of samples for a given probability of error. This is joint work with Sreeram C. Sreenivasan.
Srikrishna Bhashyam received the B.Tech. degree in electronics and communication engineering from IIT Madras, India, in 1996, and the M.S. and Ph.D. degrees in electrical and computer engineering from Rice University, Houston, TX, USA, in 1998 and 2001, respectively. He was a Senior Engineer with Qualcomm Inc., Campbell, CA, USA, from 2001 to 2003, where he was involved in wideband code division multiple access modem design. Since 2003, he has been with IIT Madras. He is currently a Professor with the Department of Electrical Engineering. His research interests include communication and information theory, statistical signal processing, and wireless networks. He served as an Editor for the IEEE Transactions on Wireless Communications from 2009 to 2014. He has been an Editor of the IEEE Transactions on Communications since 2017.