Project Details
Projekt Print View

Robustly Identifying Dependent Components in Multiple High-Dimensional Data Sets Based on Few Observations

Subject Area Electronic Semiconductors, Components and Circuits, Integrated Systems, Sensor Technology, Theoretical Electrical Engineering
Term from 2014 to 2025
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 262301625
 
Final Report Year 2023

Final Report Abstract

In this project, we developed systematic and theory-backed techniques for robustly identifying dependent and correlated components in multiple high-dimensional data sets. The identification of dependent components in multiple data sets is a fundamental problem in many practical applications ranging from communications engineering (e.g., estimating the number of sources impinging upon a group of sensor arrays) to climate science (e.g., identifying coupled climate patterns) to biomedicine (e.g., finding correlated features for the fusion of brain imaging data from different modalities). The challenge in these applications is that often the data sets are high-dimensional with few observations or available samples and contain latent components with unknown probability distributions. We showed that determining the complete correlation structure, i.e., which components are correlated across which data sets fully characterizes the second-order dependence between the data sets. The project was subdivided into two parts. In the first grant, algorithms to identify the correlations between components in two high-dimensional data sets were developed. The components can either be uncorrelated, or correlated between both sets. In the follow-up joint DFG grant, the identification of the correlation structures between more than two data sets was considered. This more general problem is more complex, since some components may be entirely uncorrelated, some may be correlated between some sets and some may be correlated between all sets. Compared to the existing techniques in the literature, the developed techniques do not assume an a priori correlation structure and work well for a large number of data sets, in the presence of distributional uncertainties, heavy-tailed noise, and outliers. Due to their statistical guarantees, the methods can be applied out of the box to a large array of practical problems. The applications in the scope of this project included wireless acoustic networks, array processing, neuroscience, and epilepsy, where identifying the complete correlation structure and quantifying the strength of association between multiple data sets lead to significant performance gains and identification of potential biomarkers. Our developed techniques are publicly available to allow other researchers to use and modify them for designing improved algorithms.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung