Project Details
Statistical, Computational and Algorithmic Aspects of Kernel Clustering
Applicant
Professor Debarghya Ghoshdastidar, Ph.D.
Subject Area
Theoretical Computer Science
Mathematics
Mathematics
Term
since 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 499522214
Clustering is one of the fundamental problems in data analysis, and kernel clustering algorithms are among the most popular clustering techniques. Kernel clustering algorithms can find latent groups in complex data from pairwise similarities among the data, computed using a pre-defined kernel function. Kernel methods are based on the mathematical theory of reproducing kernels, which helps in theoretically analysing the statistical performance of these methods, particularly in the context of supervised machine learning. In comparison, there is little mathematical understanding of the role of kernels in clustering. This project addresses the deficit of understanding of kernel clustering. In particular, this project combines mathematical and numerical research to answer three key questions on kernel clustering: (1) What kind of underlying cluster structures can be statistically recovered by kernel clustering? (2) How do efficient kernel clustering algorithms perform on large datasets? (3) What is the connection between kernel clustering and other clustering approaches? To answer these questions, the project will provide theoretical research on the statistical guarantees of kernel clustering for planted cluster recovery, statistical-computational tradeoff of kernel clustering, and connections between kernel clustering and distribution based clustering. As a consequence of the theoretical study, the project will also develop new computationally efficient and interpretable clustering algorithms. Furthermore, given the recently established role of kernel theory in the analysis of supervised machine learning and deep learning, this project provides a building block for a deeper understanding of modern clustering and unsupervised learning algorithms.
DFG Programme
Research Grants