Detailseite
Projekt Druckansicht

Erhöhung des klinischen Nutzens von Gensignaturen durch Vereinfachung und Validierung

Fachliche Zuordnung Medizininformatik und medizinische Bioinformatik
Förderung Förderung von 2011 bis 2022
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 208375936
 
Erstellungsjahr 2022

Zusammenfassung der Projektergebnisse

The six work packages had very different objectives ranging from issues related to the design of simulation studies for the comparison of statistical strategies to the importance of more transparent reporting of research. Both PIs are active in the STRengthening Analytical Thinking for Observational Studies (STRATOS) initiative and three WPs benefited from this international cooperation. Some of the projects have a long term aim and are still ongoing. In WP1, we investigated different aspects of the design of comparison studies in methodological statistical research. Particular attention was given to issues related to optimistic bias in the assessment of (new) methods, which were investigated through a literature-based meta-study, an own benchmarking study and a so-called „cross-design validation“ experiment involving the evaluation of methods using different designs. Furthermore, we published a (methodological) study protocol, an important concept towards more reliable comparison studies. This WP motivated a special issue to appear in the Biometrical Journal. Within WP2 we conducted a large-scale comparison study investigating prediction methods from machine learning and statistics (based on boosting, penalized regression and random forest) using 18 multi-omics cancer datasets from 'The Cancer Genome Atlas' (TCGA). The results indicate that the methods have similar (disappointing) performances, that the variability across datasets is large, and that methods taking into account the multi-omics structure have a slightly better prediction performance. The case of multi-omics data with blockwise missing data (i.e. not all omics types are available for all patients) was investigated in a follow-up study. The projects conducted as part of WP3 yielded empirical results on the behavior of parameter tuning and validation strategies in the context of multi-center studies. Regarding parameter tuning, we proposed and evaluated several procedures that aim at selecting tuning parameter values leading to better generalizing prediction models in a multi-center setting. Regarding validation strategies, we assessed the advantages of using multi-center resp. single-center data to fit prediction models in various scenarios using simulations and provided practical recommendations. In WP4 we compared approaches which combine clinical and omics data. We showed in examples that omics data may not add much to the predictive ability of a clinical predictor, provided that the information from clinical variables is fully used. In a related simulation study we compared 70 approaches but had to recognize that our simulation study had weaknesses and that the role of several relevant parameters needs better understanding. The project is ongoing. WP5 dealt with the translation of suitable approaches from low-dimensional to highdimensional data (LDD, HDD). We concentrated on methods to identify influential points (IPs) and on the non-negative garotte (NNG), one of the first proposals to combine variable selection and shrinkage. Using data from six published HDD analyses, we checked for IPs with recently proposed methods and extended some of them. We showed that IPs play a role in (nearly) all HDD. Based on our experience we concluded that the importance to check for IPs is still underrated in HDD analyses. This issue is also stressed in a overview paper by the STRATOS topic group ‘High-dimensional data’. Concerning model building in HDD, we could show that NNG can be used for the analysis if suitable initial estimates are chosen. In examples we showed that NNG has advantages over the popular lasso. In WP6 we stressed the importance of structured reporting as one of the key instruments to improve completeness and transparency of reporting research, not only in the health sciences but also in methodological research. Assessing fifteen prognostic factor studies in a REMARK (REporting recommendations for tumor MARKer prognostic studies) profile, we could clearly demonstrate severe weaknesses of analyses and reporting of prognosis studies. Together with cooperation partners we started a work on a related TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) profile and will expand the REMARK reporting guidelines to include any type of factors used for any diagnostic or prognostic purposes.

Projektbezogene Publikationen (Auswahl)

  • 2019. A plea for taking all available clinical information into account when assessing the predictive value of omics data, BMC Medical Research Methodology, 19:162
    A. Volkmann, R. De Bin, W. Sauerbrei, A.-L. Boulesteix
    (Siehe online unter https://doi.org/10.1186/s12874-019-0802-0)
  • 2020. An introduction to statistical simulations in health research. BMJ Open 10:e039921
    A.-L. Boulesteix, R. Groenwold, M. Abrahamowicz, H. Binder, M. Briel, R. Hornung, T. Morris, J. Rahnenführer, W. Sauerbrei
    (Siehe online unter https://doi.org/10.1136/bmjopen-2020-039921)
  • 2020. Combining clinical and molecular data in regression prediction models: insights from a simulation study. Briefings in Bioinformatics 21(6):1904-1919
    R. De Bin, A.-L. Boulesteix, A. Benner, N. Becker, W. Sauerbrei
    (Siehe online unter https://doi.org/10.1093/bib/bbz136)
  • 2020. Single-center versus multi-center data sets for molecular prognostic modeling: A simulation study. Radiation Oncology 15:109
    D. Samaga, R. Hornung, H. Braselmann, J. Hess, H. Zitzelsberger, C. Belka, A.-L. Boulesteix, K. Unger
    (Siehe online unter https://doi.org/10.1186/s13014-020-01543-1)
  • 2021. Improved outcome prediction across data sources through robust parameter tuning. Journal of Classification 38:212-231
    N. Ellenbach, A.-L. Boulesteix, B. Bischl, K. Unger, R. Hornung
    (Siehe online unter https://doi.org/10.1007/s00357-020-09368-z)
  • 2021. Large-scale benchmark study of survival prediction methods using multi-omics data. Briefings in Bioinformatics 22(3):1-15
    M. Herrmann, P. Probst, R. Hornung, V. Jurinovic, A.-L. Boulesteix
    (Siehe online unter https://doi.org/10.1093/bib/bbaa167)
  • 2021. On the optimistic performance evaluation of newly introduced bioinformatic methods. Genome Biology 22:152
    S. Buchka, A. Hapfelmeier, P.P. Gardner, R. Wilson, A.-L. Boulesteix
    (Siehe online unter https://doi.org/10.1186/s13059-021-02365-4)
  • 2022. Over-optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 12(2):e1441
    C. Niessl, M. Herrmann, C. Wiedemann, G. Casalicchio, A.-L. Boulesteix
    (Siehe online unter https://doi.org/10.1002/widm.1441)
  • 2022. REMARK Guidelines for Tumour Biomarker Study Reporting: A Remarkable History, British Journal of Cancer, 1-3
    D.F. Hayes, W. Sauerbrei, L.M. McShane
    (Siehe online unter https://doi.org/10.1038/s41416-022-02046-4)
  • 2022. Structured reporting to improve transparency of analyses in prognostic marker studies. BMC Medicine 20:1-9
    W. Sauerbrei, T. Haeussler, J. Balmford, M. Huebner
    (Siehe online unter https://doi.org/10.1186/s12916-022-02304-5)
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung