Project Details
REFOCuS: Robust Estimation for Cell- and Casewise Contamination in Sparse Regression Models
Applicant
Professor Dr.-Ing. Michael Muma
Subject Area
Electronic Semiconductors, Components and Circuits, Integrated Systems, Sensor Technology, Theoretical Electrical Engineering
Term
from 2019 to 2023
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 425884435
With the rapid advances in data science and signal processing, there is an ever-increasing need for reliable and robust information extraction and processing. Regression analysis is one of the most widely used techniques for investigating and modeling the relations between variables, with many applications in engineering, economics, biomedicine, social sciences, and others. This proposal develops advanced robust regression methods that are not significantly affected by outliers, or small model departures.Robust statistical signal processing is currently facing new challenges: A new research focus is urgently needed due to the complexity of today’s data including latent low-rank structures, sparsity, impulsive noise, outliers, and missing values. A major concern is high-dimensionality, but also settings where the sample size is smaller, or not much larger than the data dimension. Traditional robust methods that are based on asymptotic theory perform poorly in such settings, and robust regularized methods that find sparse solutions, are needed. However, to satisfy today’s robustness requirements, not only the signal model has to match the data structure. The contamination model must also be realistic. In robust statistical signal processing, by far, the most popular contamination model to describe deviations from the nominal model, is the Turkey-Huber contamination model, which assumes that a minority of the cases, i.e., the rows of the regression matrix may be contaminated. Recently, research has come to realize that the outlying rows paradigm is no longer sufficient for modern high-dimensional data sets. Intuitively, the problem occurs because there is a large probability that in higher dimensions most observations are contaminated at least in one of their components. These considerations have motivated a more general contamination model, the independent contamination model (ICM) that also allows for modeling cell-wise and case-wise outliers. Traditional robust estimators quickly break down for ICM contamination, and, even in the nonsparse setting, very few ICM robust approaches have been proposed. Thus, the main objective of REFOCuS is to perform robust variable selection and parameter estimation assuming sparse and high-dimensional regression models that suffer from ICM contamination. To achieve our aim, we will develop a fundamentally new framework based on sparsity and robustness inducing polytopes. Parallels to existing penalized robust estimators will be explored, and statistical and robustness analysis will be performed. A biomedical application will serve as real-data validation for the proposed methods. The successful completion of the project will provide computationally efficient and tractable ICM robust regression methods bringing robustness to high-dimensional data science.
DFG Programme
Research Grants