Robustly Identifying Dependent Components in Multiple High-Dimensional Data Sets Based on Few Observations
Final Report Abstract
In this project, we developed systematic and theory-backed techniques for robustly identifying dependent and correlated components in multiple high-dimensional data sets. The identification of dependent components in multiple data sets is a fundamental problem in many practical applications ranging from communications engineering (e.g., estimating the number of sources impinging upon a group of sensor arrays) to climate science (e.g., identifying coupled climate patterns) to biomedicine (e.g., finding correlated features for the fusion of brain imaging data from different modalities). The challenge in these applications is that often the data sets are high-dimensional with few observations or available samples and contain latent components with unknown probability distributions. We showed that determining the complete correlation structure, i.e., which components are correlated across which data sets fully characterizes the second-order dependence between the data sets. The project was subdivided into two parts. In the first grant, algorithms to identify the correlations between components in two high-dimensional data sets were developed. The components can either be uncorrelated, or correlated between both sets. In the follow-up joint DFG grant, the identification of the correlation structures between more than two data sets was considered. This more general problem is more complex, since some components may be entirely uncorrelated, some may be correlated between some sets and some may be correlated between all sets. Compared to the existing techniques in the literature, the developed techniques do not assume an a priori correlation structure and work well for a large number of data sets, in the presence of distributional uncertainties, heavy-tailed noise, and outliers. Due to their statistical guarantees, the methods can be applied out of the box to a large array of practical problems. The applications in the scope of this project included wireless acoustic networks, array processing, neuroscience, and epilepsy, where identifying the complete correlation structure and quantifying the strength of association between multiple data sets lead to significant performance gains and identification of potential biomarkers. Our developed techniques are publicly available to allow other researchers to use and modify them for designing improved algorithms.
Publications
-
Bootstrap-based detection of the number of signals correlated across multiple data sets. 2016 50th Asilomar Conference on Signals, Systems and Computers, 610-614. IEEE.
Hasija, Tanuj; Song, Yang; Schreier, Peter J. & Ramirez, David
-
Canonical correlation analysis of high-dimensional data with very small sample support. Signal Processing, 128, 449-458.
Song, Yang; Schreier, Peter J.; Ramírez, David & Hasija, Tanuj
-
Detecting the dimension of the subspace correlated across multiple data sets in the sample poor regime. 2016 IEEE Statistical Signal Processing Workshop (SSP), 1-5. IEEE.
Hasija, Tanuj; Song, Yang; Schreier, Peter J. & Ramirez, David
-
Sample-poor estimation of order and common signal subspace with application to fusion of medical imaging data. NeuroImage, 134, 486-493.
Levin-Schwartz, Yuri; Song, Yang; Schreier, Peter J.; Calhoun, Vince D. & Adalı, Tülay
-
A sparse CCA algorithm with application to model-order selection for small sample support. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4721-4725. IEEE.
Lameiro, Christian & Schreier, Peter J.
-
Determining the Dimension of the Improper Signal Subspace in Complex-Valued Data. IEEE Signal Processing Letters, 24(11), 1606-1610.
Hasija, Tanuj; Lameiro, Christian & Schreier, Peter J.
-
Exercise-Induced Changes of Multimodal Interactions Within the Autonomic Nervous Network. Frontiers in Physiology, 10.
Vieluf, Solveig; Hasija, Tanuj; Jakobsmeyer, Rasmus; Schreier, Peter J. & Reinsberger, Claus
-
Source Enumeration and Robust Voice Activity Detection in Wireless Acoustic Sensor Networks. 2019 53rd Asilomar Conference on Signals, Systems, and Computers, 1257-1261. IEEE.
Hasija, Tanuj; Gölz, Martin; Muma, Michael; Schreier, Peter J. & Zoubir, Abdelhak M.
-
Determining the dimension and structure of the subspace correlated across multiple data sets. Signal Processing, 176, 107613.
Hasija, Tanuj; Marrinan, Timothy; Lameiro, Christian & Schreier, Peter J.
-
Generalized tonic-clonic seizures are accompanied by changes of interrelations within the autonomic nervous system. Epilepsy & Behavior, 124, 108321.
Vieluf, Solveig; Hasija, Tanuj; Schreier, Peter J.; El Atrache, Rima; Hammond, Sarah; Mohammadpour, Touserkani Fatemeh; Sarkis, Rani A.; Loddenkemper, Tobias & Reinsberger, Claus
-
A GLRT for estimating the number of correlated components in sample-poor mCCA. 2022 30th European Signal Processing Conference (EUSIPCO), 2091-2095. IEEE.
Hasija, Tanuj & Marrinan, Timothy
-
Estimating Test Statistic Distributions for Multiple Hypothesis Testing in Sensor Networks. 2022 56th Annual Conference on Information Sciences and Systems (CISS), 90-95. IEEE.
Golz, Martin; Zoubir, Abdelhak M. & Koivunen, Visa
-
Improving Inference for Spatial Signals by Contextual False Discovery Rates. ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5967-5971. IEEE.
Golz, Martin; Zoubir, Abdelhak M. & Koivunen, Visa
-
Multiple Hypothesis Testing Framework for Spatial Signals. IEEE Transactions on Signal and Information Processing over Networks, 8, 771-787.
Golz, Martin; Zoubir, Abdelhak M. & Koivunen, Visa
-
Identifying the Complete Correlation Structure in Large-Scale High-Dimensional Data Sets with Local False Discovery Rates
Gölz, M., Hasija, T., Muma, M. & Zoubir, A.M.
