The overall execution time of distributed matrix computations is often dominated by slow or failed worker nodes (also known as stragglers). Recently, ideas from coding theory have been adapted to these problems; these allow for recovery of the intended result as long as a minimum number (threshold) of worker nodes complete their assigned tasks. In this talk we will highlight some important issues with several prior works in the area. These include numerical instability in the recovery of the desired result and ignoring work performed by slow workers (or partial stragglers). Furthermore, several prior schemes can cause an undesirable increase in the worker computation time when the input matrices are sparse. We will discuss some of our recent contributions in this area that address these issues.
Aditya Ramamoorthy is the Northrop Grumman Professor of Electrical and Computer Engineering and (by courtesy) of Mathematics at Iowa State University. He received his B. Tech. degree in Electrical Engineering from the Indian Institute of Technology, Delhi and the M.S. and Ph.D. degrees from the University of California, Los Angeles (UCLA). His research interests are in the areas of classical/quantum information theory and coding techniques with applications to distributed computation, content distribution networks and machine learning. Dr. Ramamoorthy served as an editor for the IEEE Transactions on Information Theory from 2016 to 2019 and the IEEE Transactions on Communications from 2011 to 2015. He is the recipient of the 2020 Mid-Career Achievement in Research Award, the 2019 Boast-Nilsson Educational Impact Award and the 2012 Early Career Engineering Faculty Research Award from Iowa State University, the 2012 NSF CAREER award, and the Harpole-Pentair professorship in 2009 and 2010.