Abstract: We consider the classical metric k-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a 2-approximate solution in polynomial time for k=O(1) and works irrespective of the underlying distance measure as long as it is a metric; however, going below the 2-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to permutations/rankings. We first talk about a new algorithmic framework for clustering a set of permutations. Our first result is a 1.999-approximation algorithm for the metric k-median problem under the Ulam metric. In fact, our framework is powerful enough to extend this result to the streaming model using "small" space. In this talk, we also describe other variants of the clustering problems defined over rankings.
Bio:
Diptarka is an assistant professor at the National University of Singapore (NUS). He did his Ph.D. at the Indian Institute of Technology, Kanpur. Before joining NUS, he spent two years at Charles University, Prague, and then almost a year at Weizmann Institute of Science, Israel as a post- doctoral fellow. His research interest mostly lies in theoretical computer science, more specifically, algorithms on large data sets, approximation algorithms, sublinear algorithms, string matching algorithms, and graph algorithms. He is a recipient of the best paper award at FOCS 2018 and the Google South & Southeast Asia Research Award 2022.
Abstract: We consider the classical metric k-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a 2-approximate solution in polynomial time for k=O(1) and works irrespective of the underlying distance measure as long as it is a metric; however, going below the 2-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to permutations/rankings. We first talk about a new algorithmic framework for clustering a set of permutations. Our first result is a 1.999-approximation algorithm for the metric k-median problem under the Ulam metric. In fact, our framework is powerful enough to extend this result to the streaming model using "small" space. In this talk, we also describe other variants of the clustering problems defined over rankings.
Bio:
Diptarka is an assistant professor at the National University of Singapore (NUS). He did his Ph.D. at the Indian Institute of Technology, Kanpur. Before joining NUS, he spent two years at Charles University, Prague, and then almost a year at Weizmann Institute of Science, Israel as a post- doctoral fellow. His research interest mostly lies in theoretical computer science, more specifically, algorithms on large data sets, approximation algorithms, sublinear algorithms, string matching algorithms, and graph algorithms. He is a recipient of the best paper award at FOCS 2018 and the Google South & Southeast Asia Research Award 2022.