Project Details
Classification -- Preprocessed and high-dimensional data sets
Applicant
Professorin Dr. Angelika Rohde
Subject Area
Mathematics
Term
since 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 460867398
We study classification based on two types of preprocessing designed for imbalanced and sensitive data sets, as well as consequences of high dimensional features. Imbalanced data sets are known to significantly reduce the performance of classifiers in statistical learning. Learning algorithms designed for equally balanced classes tend to be biased towards the majority class. We will introduce a theoretical framework to study this bias-towards-the-majority-class effect and will develop jointly with {\bf Project~IV} statistically efficient data reduction preprocessing within the majority class. In parallel, supervised classification is studied based on preprocessed training data satisfying an $\alpha$-local differential privacy constraint. The particularly challenging case of privatized functional (i.e., infinite dimensional) covariates is developed in collaboration with {\bf Project~III}. Finally, we will investigate the misclassification error in a framework where the number of feature variables is not negligible compared to sample size. In the case of high-dimensional features the computational cost of classical statistical procedures becomes prohibitive. Together with {\bf Project~II} we investigate the statistical accuracy of iterative gradient descent methods and develop computationally efficient and fully data driven learning algorithms.
DFG Programme
Research Units
Subproject of
FOR 5381:
Mathematical Statistics in the Information Age - Statistical Efficiency and Computational Tractability
International Connection
Austria
Partner Organisation
Fonds zur Förderung der wissenschaftlichen Forschung (FWF)
Cooperation Partner
Professor Dr. Lukas Steinberger