Coresets for Machine Learning

# 123






Abstract

In the face of the data onslaught, smart algorithms have a significant role to play. Over the last couple of decades, coresets, a small and efficiently calculable data summary, have grown in popularity, both in theoretical and practical settings. They enable approximating large optimizations while needing only a fraction of the resources. In this talk, we will discuss a few recent results related to creating coresets for tensor factorization and Bregman clustering. Our coresets are online in nature, i.e., for every incoming point, it takes an irrevocable decision whether to include it in the coreset. We will also discuss some in-progress results related to creating coresets with deterministic guarantees. The talk is based on joint works with Jayesh Choudhari, Supratim Shit, and Rachit Chhaya.

Anirban Dasgupta, IIT Gandhinagar.

Anirban Dasgupta is currently the N. Rama Rao Chair Professor of Computer Science & Engineering at IIT Gandhinagar. Prior to being at IIT Gandhinagar, he was a Senior Scientist at Yahoo Labs Sunnyvale. Anirban works on algorithmic problems for massive data sets, large-scale machine learning, analysis of large social networks, and randomized algorithms in general. He did his undergraduate studies at IIT Kharagpur and doctoral studies at Cornell University. He has received the Google Faculty Research Award (2015), the Cisco University Award (2016), the ICDT Best Newcomer Award (2016), and the Google India AI/ML Award (2020).