Project Details
Robustly Identifying Dependent Components in Multiple High-Dimensional Data Sets Based on Few Observations
Subject Area
Electronic Semiconductors, Components and Circuits, Integrated Systems, Sensor Technology, Theoretical Electrical Engineering
Term
from 2014 to 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 262301625
The objective of this proposal is the development of methods to robustly identify dependent components in multiple high-dimensional data structures, where sample support is relatively small. Many algorithms require this information as an input parameter. For instance, in biomedicine, there are established approaches for fusing the data from different brain imaging modalities, but in order to apply them, we need to know the dependent components in different feature sets. As another example, in sensor array processing, many algorithms for resolving sources (e.g. estimating their direction of arrival) need to have prior information about the number of sources impinging upon the array. Often, this problem is solved ad hoc, with greatly varying results. The development of systematic approaches will therefore be of interest to a wide array of areas in the natural sciences and engineering. The focus of our proposal will be on the theory, but in order to illustrate and investigate the performance of our methods, we will choose some selected applications in biomedicine. More specifically, in this proposal, our objectives are: - To develop model-selection rules for multiple data sets with relatively small sample support. Treating multiple data sets is much more difficult than finding dependencies between two data sets because there are many possible dependence structures. The very few existing approaches work only for large sample support and make very restrictive assumptions about the underlying correlation structure.- To make our second-order techniques robust against deviations from Gaussianity. This is critical in order to be able to deal with heavy-tailed noise and outliers, which are commonplace in many applications. - To first build a theory based on second-order correlations, which consider linear dependencies between data sets, and then extend our approaches to also take into account nonlinear dependencies. - To investigate the restrictions that small sample support imposes on the identifiability of nonlinear dependencies. Obviously, we would expect that the number of samples determines the amount of information that can be extracted from the data sets. - To apply our techniques to some selected problems in biomedicine. It is expected that these applications will benefit greatly from this research project. It is still commonplace in the biomedical community to solve model-selection problems using ad-hoc approaches and rules of thumb. A systematic approach will help provide more convincing and satisfying solutions.
DFG Programme
Research Grants