Theoretical analysis of practically important aspects of cluster analysis
Final Report Abstract
This project’s area is the analysis of algorithms. It focuses on the design of practical approximation algorithms for cluster analysis and the analysis of known clustering heuristics. Cluster analysis is used to examine data for unknown structures. Clustering algorithms automatically divide a given set of objects into groups, known as clusters. In theoretical algorithmics, the quality of a clustering is evaluated using a mathematical objective function. Such an objective function could be the largest diameter or radius of any cluster in the partitioning. These two objectives are modeled by the k-diameter and the k-center objective function, respectively. Problems like the k-diameter and k-center problem, as well as related problems like the k-median and k-means problem, have been thoroughly investigated both theoretically and practically within the last decades. There are various approximation algorithms that provably compute good solutions for these objective functions. However, clustering problems in practice often occur in less idealized forms, making the above models insufficient. This project investigated practical extensions of clustering problems. Part 1 dealt with extending clustering problems by side constraints. Imagine households being divided into groups to decide the placement of public kindergartens. In addition to the distances of households to their nearest kindergarten, the capacity of the facilities also plays a role: The number of households assigned to a kindergarten should neither be too small nor too large. In addition to cardinality-based constraints, we also investigated fairness constraints, which are concerned with ensuring that all clusters are fair with respect to a sensitive attribute. We designed approximation algorithms and investigated the efficient practical calculation of clusterings under side constraints. In applications, it is often already a significant challenge to sensibly choose the number k of clusters for a given data set. Theoretical models (as those mentioned above) usually circumvent this problem by assuming that k is explicitly given, which is often unrealistic. Hierarchical clusterings solve this problem by providing a clustering hierarchy that contains a clustering with k clusters for each k. Although hierarchical clusterings play a major role in applications, they are not well understood theoretically. Our goal in Part 2 was to expand the theoretical understanding of hierarchical clustering algorithms to narrow the gap between theory and practice in this area. We designed and analyzed various algorithms for computing hierarchical clusterings and were able to show new bounds on how good the solutions are that these algorithms compute.
Publications
-
Noisy, Greedy and Not so Greedy k-Means++. In: Proceedings of 28th Annual European Symposium on Algorithms (ESA), u Band 173, S. 18:1–18:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
Anup Bhattacharya; Jan Eube; Heiko Röglin & Melanie Schmidt
-
Achieving Anonymity via Weak Lower Bound Constraints for k-Median and k-Means. In: Proceedings of 38th International Symposium on Theoretiu cal Aspects of Computer Science (STACS), S. 7:1–7:17. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021.
Anna Arutyunova & Melanie Schmidt
-
Coresets for constrained k-median and k-means clustering in low dimensional Euclidean space. CoRR, abs/2106.07319, 2021.
Melanie Schmidt, Julian Wargalla
-
Upper and Lower Bounds for Complete Linkage in General Metric Spaces. In: Proceedings of 24th International Conference on Approximation Algorithms for Combinatorial Optimization Problems (APPROX), S. 18:1–18:22. Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2021.
Anna Arutyunova; Anna Großwendt; Heiko Röglin; Melanie Schmidt & Julian Wargalla
-
Applying Graph-based Clustering to Tracklet-Tracklet Correlation. In: Proceedings of the International Astronautical Congress, 2022. ISSN: 0074-1795.
Franziska Griese, Kathrin Rack, Simon Schmitz, Hauke Fiedler, Benjamin Hofmann, Melanie Schmidt, Daniel Schmidt
-
The Price of Hierarchical Clustering. In: Proceedings of 30th Annual European Symposium on Algorithms (ESA), Band 244, S. 10:1–10:14. Schloss Dagstuhl – u Leibniz-Zentrum für Informatik, 2022.
Anna Arutyunova & Heiko Röglin
-
Approximations for Hierarchical and Lower-Bounded Clustering and the Coma plexity of Minimum-Error Triangulation. Dissertation, Rheinische Friedrich-Wilhelms-Universit¨t Bonn, 2023.
Anna Arutyunova
-
How Does Fairness Affect the Complexity of Gerrymandering? In: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (AAMAS), S. 2869–2871. ACM, 2023.
Sandip Banerjee; Rajesh Chitnis & Abhiruk Lahiri
-
MRI Scan Synthesis Methods based on Clustering and Pix2Pix. CoRR, abs/2312.05176, 2023.
Giulia Baldini, Melanie Schmidt, Charlotte Zäske, Liliana L. Caldeira
