Project Details
Projekt Print View

Lifespan AI - Project D2: From Longitudinal to Lifespan Predictions

Subject Area Epidemiology and Medical Biometry/Statistics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term since 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 459360854
 
Chronic diseases such as obesity, cardiovascular diseases or dementia typically evolve over long time spans before becoming manifest and some were suggested to have their origin even in utero. Studying the developmental mechanisms of diseases with long latency and predicting (markers of) diseases long before an outbreak is very challenging. As there is almost no study covering the total lifespan, the joined analysis of multiple cohorts covering different periods in life provides the most promising approach to gain insights into long-term disease processes. So far, there is a striking lack of methods allowing to adequately analyse and predict the complex interplay of various factors based on pooled data of multiple cohorts. The proposed project intends to fill this gap by developing lifespan artificial intelligence (AI) methods suited to the prediction of individual-level health trajectories over extended time spans. Generalised linear mixed-effects models (GLMM) display a flexible tool commonly used in epidemiology for modelling longitudinal and clustered data. However, among other drawbacks they are based on restrictive parametric assumptions. Flexible nonlinear machine learning (ML) methods like random forests (RF) and deep neural networks (DNN) mitigate the limitations but implicitly assume the data to be independent and identically distributed leading to inefficient estimates in a longitudinal setting. To combine their particular strengths while mitigating limitations, in this project we aim to advance so-called mixed-effects machine learning (ME-ML) approaches.In particular, we will advance ME-ML approaches for the prediction of individual health trajectories based on pooled cohort data integrating the random-effects structure of GLMM into RF and DNN, and assess the time span that can be validly predicted beyond the actual measurement period based on the devised method. We will further study how to best integrate data of multiple cohorts to generate a harmonised dataset as well as how the design features of multi-cohort studies such as the periods of overlap between cohorts affect the identifiability and performance of standard statistical methods for lifespan predictions and causal discovery. In summary, theoretical and practical investigations with regard to data harmonisation and the design features of multi-cohort studies will be complemented by methodological developments where promising candidates for lifespan AI methods based on ME-ML will be advanced and validated. Finally, the predictive performance of the newly developed methods will be compared with standard methods and evaluated considering different potential study designs and statistical issues in multi-cohort studies. Data from several cohorts will be used for illustration and validation of methods.
DFG Programme Research Units
 
 

Additional Information

Textvergrößerung und Kontrastanpassung