Survival models with high-dimensional data structure (H: High-dimensional)

Antragsteller Professor Dr. Martin Schumacher

Fachliche Zuordnung Epidemiologie und Medizinische Biometrie/Statistik

Förderung Förderung von 2007 bis 2011

Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 5470786

Many clinical disciplines are still suffering from a comparatively low predictive power of specially developed risk scores. A hope is that essential progress is initiated by identification of genomic and protcomic features. Here, microarray data and protein mass spectra promise further insights. The understanding of whole genomes and the development of disease specific biomarkers should aid diagnosis, improve the performance of prognostic scores, arid finally lead to new treatments. Such data is characterized by a huge number of potential predictors and typically only few patients, which makes it difficult to analyze. Standard survival techniques, such as fitting a Cox regression model by maximizing partial likelihood, are not directly applicable. In this project we adapt statistical approaches that can deal with high-dimensional data structures, such as penalized estimation and boosting. These methods have been developed mostly for the continuous and binary response case. Only recently, some proposals have been made for right censored event time response variables, but there arc still methodological problems. An example is the rather fragile selection of the number of steps required for path algorithm procedures. There is little research on modelling of time variation of covariates for high-dimensional data, potentially in combination with time-varying effects on survival. Therefore we start with discrete-time survival models, where timevarying covariates are easily incorporated and available techniques for binary responses variables can be adapted. In a next step we develop a competitive continuous-time approach. Boosting and path algorithm techniques will be investigated for estimation. A central problem is the selection of regularization or complexity parameters. For our discrete-time survival approach, model selection criteria built on model-based estimates of the effective degrees of freedom will be adapted. For validation, we will investigate bootstrap-based estimates of the degrees of freedom. For continuous-time survival models, such degrees of freedom estimates are difficult to obtain, and it is important to take the right censored data structure into account. We will focus on resampling-based estimates of prediction error, that incorporate time and deal appropriately with right censoring. These estimates will then be used for selection of model complexity, to avoid overfitting for our flexible time survival approach. As an alternative, model selection based on false discovery rates will be investigated. The work in this project will be closely coordinated with the projects of our clinical research partners. In particular, a comprehensive analysis for the project "Microarray validation of cardiovascular risk factors" will be provided. Further benefit can be expected from collaboration with Time-varying and Dynamic scores.

DFG-Verfahren Forschungsgruppen

Teilprojekt zu FOR 534: Statistical Modeling and Data Analysis in Clinical Epidemiology

Beteiligte Person Professor Dr. Jens Timmer

Servicenavigation

Hauptnavigation

Survival models with high-dimensional data structure (H: High-dimensional)

Zusatzinformationen

Servicenavigation

Hauptnavigation

Survival models with high-dimensional data structure (H: High-dimensional)

Zusatzinformationen

Textvergrößerung und Kontrastanpassung