FADeBaC Sentiment Analysis - Fully Automatic DEnsity-BAsed Clustering applied to Sentiment Analysis

Applicants Professor Dr. Hinrich Schütze; Professor Dr. Ingo Steinwart

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term from 2012 to 2018

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 219327280

Final Report Year 2018

Final Report Abstract

The project focused on three aspects of density-based clustering, namely: • Development and statistical analysis of adaptive density-based clustering algorithms • Foundations of clusterings with ground truth • Implementation of density-based clustering algorithms. In view of the ﬁrst aspect we achieved a signiﬁcant break-through by designing the very ﬁrst densitybased clustering algorithm for which it can be proven that the algorithm is adaptive to various unknown and non-parametric properties of the data-generating distribution. In other words, we did not only establish the best possible convergence rates for a class of clustering algorithms that are given these properties of the distribution, but we also designed a hyper-parameter selection strategy for these algorithms that achieves the same rates without knowing these properties. These algorithms are based on a generic clustering algorithm, which only requires estimates of the density level sets with a certain uncertainty control. It turned out that a variety of density estimation methods enjoy such a control. In view of the second focus, we found a set of axioms, which on the one-hand side makes it possible to consider various geometric notions of clusterings for simple sets, and on the other hand guarantee that each such notion of clustering can be uniquely extended to a axiom-preserving clustering notion for a large set of complicated distributions. As a consequence, we could not only give an axiomatic foundation of density-based clustering, but we also identiﬁed several other notions of clustering that enjoy such an axiomatic foundation describing inﬁnite-sample ground truth. Finally, we implemented a new density-based clustering package in C/C++ that does not only follow the statistical insights of the ﬁrst aspect, but is also orders of magnitude faster than existing densitybased clustering packages. Moreover, it contains a fully automated hyper-parameter selection routine and bindings to standard languages such as Python, R, and Matlab are currently being written.

Publications

Fully adaptive density-based clustering. Ann. Statist., 43:2132–2167, 2015. + 2 Supplements of together 52 pages
Steinwart, Ingo
Towards an axiomatic approach to hierarchical clustering of measures. J. Mach. Learn. Res., 16:1949–2002, 2015
P. Thomann, I. Steinwart, and N. Schmid
Kernel density estimation for dynamical systems. Technical Report, Fakultät für Mathematik und Physik, Universität Stuttgart, 2016
H. Hang, I. Steinwart, Y. Feng, and J.A.K. Suykens
Adaptive clustering using kernel density estimators. Technical report, Fakultät für Mathematik und Physik, Universität Stuttgart, 2017
I. Steinwart, B.K. Sriperumbudur, and P. Thomann
Sobolev norm learning rates for the regularized least-squares algorithm. Technical report, Fakultät für Mathematik und Physik, Universität Stuttgart, 2017
S. Fischer and I. Steinwart

Servicenavigation

Hauptnavigation

FADeBaC Sentiment Analysis - Fully Automatic DEnsity-BAsed Clustering applied to Sentiment Analysis

Final Report Abstract

Publications

Additional Information

Servicenavigation

Hauptnavigation

FADeBaC Sentiment Analysis - Fully Automatic DEnsity-BAsed Clustering applied to Sentiment Analysis

Final Report Abstract

Publications

Additional Information

Textvergrößerung und Kontrastanpassung