Project Details
Reduction of sampling and data complexity by modern sparsification techniques
Applicant
Professor Dr. Tino Ullrich
Subject Area
Mathematics
Term
since 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 533875539
The present digital age has seen an enormous growth of digital data that is transmitted, collected, and processed, for instance, in machine learning (ML) based applications. In order to cope with this phenomenon, efficient methods are needed for storing, handling, and reducing the arising massive data sets as well as to extract relevant information out from them. Such demand sets major challenges at the frontier of mathematics, computer science, and electrical engineering. Some of those require fundamentally new approaches. Despite the great success of ML in the recent decade, a fundamental problem remains. For most ML based algorithms a huge amount of training data is needed to “guarantee” their success. However, often data acquisition is difficult and expensive, for instance, when there is a need for high-priced sensors. In addition, the training step typically comes with a computational effort growing vastly in the amount of data. In this project, we hence aim for new approaches to reduce the sampling and data complexity in such models. Rather than on the development of new ML methods, we thereby focus on the problem of data reduction. It turned out that many data sets can be significantly “sub-sampled” without losing relevant information. For this, one takes advantage of inherent sparsity of the data. The corresponding problem of “sparsification” appears in different scenarios. The core task can mostly be formulated as reducing the number of data vectors in a huge data matrix while preserving its spectral properties. The spectral information is where the relevant information is encoded. Closely connected is the task of frame subsampling, a field which has seen recent progress based on the solution of the Kadison–Singerproblem. Utilizing such recent results, our plan is to develop and analyze new methods striving for optimal sparsity in data representation. In a related but slightly different context, we aim to reduce the number of nodes in the sampling discretization of functions. This allows for controlling the optimal worst-case error in the extraordinarily difficult but important problem of function recovery from partial and incomplete information. One of our main goals here is to overcome the current gap between non-constructive and constructive methods. Progress in this direction will certainly help to conceive new methods suitable for practical applications in the future.
DFG Programme
Research Grants