Project Details
Projekt Print View

Increasing clinical usefulness of gene signature prediction rules through simplification and validation

Subject Area Medical Informatics and Medical Bioinformatics
Term from 2011 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 208375936
 
Final Report Year 2022

Final Report Abstract

The six work packages had very different objectives ranging from issues related to the design of simulation studies for the comparison of statistical strategies to the importance of more transparent reporting of research. Both PIs are active in the STRengthening Analytical Thinking for Observational Studies (STRATOS) initiative and three WPs benefited from this international cooperation. Some of the projects have a long term aim and are still ongoing. In WP1, we investigated different aspects of the design of comparison studies in methodological statistical research. Particular attention was given to issues related to optimistic bias in the assessment of (new) methods, which were investigated through a literature-based meta-study, an own benchmarking study and a so-called „cross-design validation“ experiment involving the evaluation of methods using different designs. Furthermore, we published a (methodological) study protocol, an important concept towards more reliable comparison studies. This WP motivated a special issue to appear in the Biometrical Journal. Within WP2 we conducted a large-scale comparison study investigating prediction methods from machine learning and statistics (based on boosting, penalized regression and random forest) using 18 multi-omics cancer datasets from 'The Cancer Genome Atlas' (TCGA). The results indicate that the methods have similar (disappointing) performances, that the variability across datasets is large, and that methods taking into account the multi-omics structure have a slightly better prediction performance. The case of multi-omics data with blockwise missing data (i.e. not all omics types are available for all patients) was investigated in a follow-up study. The projects conducted as part of WP3 yielded empirical results on the behavior of parameter tuning and validation strategies in the context of multi-center studies. Regarding parameter tuning, we proposed and evaluated several procedures that aim at selecting tuning parameter values leading to better generalizing prediction models in a multi-center setting. Regarding validation strategies, we assessed the advantages of using multi-center resp. single-center data to fit prediction models in various scenarios using simulations and provided practical recommendations. In WP4 we compared approaches which combine clinical and omics data. We showed in examples that omics data may not add much to the predictive ability of a clinical predictor, provided that the information from clinical variables is fully used. In a related simulation study we compared 70 approaches but had to recognize that our simulation study had weaknesses and that the role of several relevant parameters needs better understanding. The project is ongoing. WP5 dealt with the translation of suitable approaches from low-dimensional to highdimensional data (LDD, HDD). We concentrated on methods to identify influential points (IPs) and on the non-negative garotte (NNG), one of the first proposals to combine variable selection and shrinkage. Using data from six published HDD analyses, we checked for IPs with recently proposed methods and extended some of them. We showed that IPs play a role in (nearly) all HDD. Based on our experience we concluded that the importance to check for IPs is still underrated in HDD analyses. This issue is also stressed in a overview paper by the STRATOS topic group ‘High-dimensional data’. Concerning model building in HDD, we could show that NNG can be used for the analysis if suitable initial estimates are chosen. In examples we showed that NNG has advantages over the popular lasso. In WP6 we stressed the importance of structured reporting as one of the key instruments to improve completeness and transparency of reporting research, not only in the health sciences but also in methodological research. Assessing fifteen prognostic factor studies in a REMARK (REporting recommendations for tumor MARKer prognostic studies) profile, we could clearly demonstrate severe weaknesses of analyses and reporting of prognosis studies. Together with cooperation partners we started a work on a related TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) profile and will expand the REMARK reporting guidelines to include any type of factors used for any diagnostic or prognostic purposes.

Publications

  • 2019. A plea for taking all available clinical information into account when assessing the predictive value of omics data, BMC Medical Research Methodology, 19:162
    A. Volkmann, R. De Bin, W. Sauerbrei, A.-L. Boulesteix
    (See online at https://doi.org/10.1186/s12874-019-0802-0)
  • 2020. An introduction to statistical simulations in health research. BMJ Open 10:e039921
    A.-L. Boulesteix, R. Groenwold, M. Abrahamowicz, H. Binder, M. Briel, R. Hornung, T. Morris, J. Rahnenführer, W. Sauerbrei
    (See online at https://doi.org/10.1136/bmjopen-2020-039921)
  • 2020. Combining clinical and molecular data in regression prediction models: insights from a simulation study. Briefings in Bioinformatics 21(6):1904-1919
    R. De Bin, A.-L. Boulesteix, A. Benner, N. Becker, W. Sauerbrei
    (See online at https://doi.org/10.1093/bib/bbz136)
  • 2020. Single-center versus multi-center data sets for molecular prognostic modeling: A simulation study. Radiation Oncology 15:109
    D. Samaga, R. Hornung, H. Braselmann, J. Hess, H. Zitzelsberger, C. Belka, A.-L. Boulesteix, K. Unger
    (See online at https://doi.org/10.1186/s13014-020-01543-1)
  • 2021. Improved outcome prediction across data sources through robust parameter tuning. Journal of Classification 38:212-231
    N. Ellenbach, A.-L. Boulesteix, B. Bischl, K. Unger, R. Hornung
    (See online at https://doi.org/10.1007/s00357-020-09368-z)
  • 2021. Large-scale benchmark study of survival prediction methods using multi-omics data. Briefings in Bioinformatics 22(3):1-15
    M. Herrmann, P. Probst, R. Hornung, V. Jurinovic, A.-L. Boulesteix
    (See online at https://doi.org/10.1093/bib/bbaa167)
  • 2021. On the optimistic performance evaluation of newly introduced bioinformatic methods. Genome Biology 22:152
    S. Buchka, A. Hapfelmeier, P.P. Gardner, R. Wilson, A.-L. Boulesteix
    (See online at https://doi.org/10.1186/s13059-021-02365-4)
  • 2022. Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12(2):e1441
    C. Niessl, M. Herrmann, C. Wiedemann, G. Casalicchio, A.-L. Boulesteix
    (See online at https://doi.org/10.1002/widm.1441)
  • 2022. REMARK Guidelines for Tumour Biomarker Study Reporting: A Remarkable History, British Journal of Cancer, 1-3
    D.F. Hayes, W. Sauerbrei, L.M. McShane
    (See online at https://doi.org/10.1038/s41416-022-02046-4)
  • 2022. Structured reporting to improve transparency of analyses in prognostic marker studies. BMC Medicine 20:1-9
    W. Sauerbrei, T. Haeussler, J. Balmford, M. Huebner
    (See online at https://doi.org/10.1186/s12916-022-02304-5)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung