Project Details
Projekt Print View

Classification -- Preprocessed and high-dimensional data sets

Subject Area Mathematics
Term since 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 460867398
 
We study classification based on two types of preprocessing designed for imbalanced and sensitive data sets, as well as consequences of high dimensional features. Imbalanced data sets are known to significantly reduce the performance of classifiers in statistical learning. Learning algorithms designed for equally balanced classes tend to be biased towards the majority class. We will introduce a theoretical framework to study this bias-towards-the-majority-class effect and will develop jointly with {\bf Project~IV} statistically efficient data reduction preprocessing within the majority class. In parallel, supervised classification is studied based on preprocessed training data satisfying an $\alpha$-local differential privacy constraint. The particularly challenging case of privatized functional (i.e., infinite dimensional) covariates is developed in collaboration with {\bf Project~III}. Finally, we will investigate the misclassification error in a framework where the number of feature variables is not negligible compared to sample size. In the case of high-dimensional features the computational cost of classical statistical procedures becomes prohibitive. Together with {\bf Project~II} we investigate the statistical accuracy of iterative gradient descent methods and develop computationally efficient and fully data driven learning algorithms.
DFG Programme Research Units
International Connection Austria
Cooperation Partner Professor Dr. Lukas Steinberger
 
 

Additional Information

Textvergrößerung und Kontrastanpassung