Theoretical analysis of practically important aspects of cluster analysis

Applicants Professor Dr. Heiko Röglin; Professorin Dr. Melanie Schmidt

Subject Area Theoretical Computer Science

Term from 2019 to 2023

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 416767905

Project Description

The area of cluster analysis is concerned with the search for unknown structure in data. The goal is to find hidden clusters, i.e. groups of points that belong together. Clustering has many applications in the area of data analysis and has thus been studied in various variations, leading to a huge body of clustering research.However, clustering also suffers from a large gap between theory and practice, even with respect to the objective functions that are studied. In theory, combinatorial problems like k-median and facility location are extremely well studied, while in practice methods for very different types of problems play a much larger role, e.g. for k-means clustering or for hierarchical clustering. Additionally, there can be additional demands on the solution, leading to constrained clustering problems.Our goal is the theoretical analysis of practically relevant questions in the area of cluster analysis. This includes the study of the approximability of more practical variants of clustering as well as the analysis of popular heuristics, e.g. under structural assumptions on the input data. Our project consists of two parts.1) Clustering with constraints:In the first part, we consider the effect of constraints on clustering problems. Examples for constraints are capacities, lower bounds, fairness constraints and outliers. First, we want to develop better approximation algorithms for clustering problems with constraints. Second, we want to analyze popular heuristics and develop practically efficient algorithms. In both cases, we are also interested in combining different constraints. Third, we want to study clustering problems with constraints in data streams.2) Hierarchical clustering:The second part considers hierarchical clustering. Instead of fixing a specific number of clusters k, hierarchical clustering computes a hierarchy of clusterings. Hierarchical clusterings play a major role in applications, yet they have been studied very little in theory. First, we want to analyze the approximation ratio of the very popular greedy heuristics for hierarchical clustering. Second, we want to develop approximation algorithms with small approximation ratios, which are not yet known for most of the variants of hierarchical clustering. We are also interested in lower bounds on the ratios of algorithms and lower bounds on the approximability itself. Third, we want to develop global objective functions for hierarchical clustering.

DFG Programme Research Grants

Servicenavigation

Hauptnavigation

Theoretical analysis of practically important aspects of cluster analysis

Additional Information

Servicenavigation

Hauptnavigation

Theoretical analysis of practically important aspects of cluster analysis

Additional Information

Textvergrößerung und Kontrastanpassung