Project Details
On the pros and cons of mega-analyses compared to meta-analyses for genome-wide association studies
Applicant
Professorin Dr. Iris M. Heid
Subject Area
Epidemiology and Medical Biometry/Statistics
Term
from 2017 to 2021
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 360274005
Genome-wide association studies have been highly successful in identifying genetic variants associated with complex diseases. To increase the sample size and statistical power, these studies are usually conducted as meta-analyses combining study-specific aggregated statistics for each of 0.5 to 30 million genotyped and imputed variants. It is hypothesized that collecting individual participant data (IPD) to enable joint quality control, joint imputation, and joint modelling in one very big data set (mega-analysis) instead of a meta-analysis of aggregated statistics would improve the ability to detect disease loci. However, there is little work evaluating what aspect of mega-analysis versus meta-analysis contributes to how much gain in data quality and disease loci detection in real data scenarios. Such evaluations are still limited due to the high computational burden of genotype imputing and the lack of large genome-wide IPD data. The question remains how much can be gained by mega-analysis and whether it is worth the effort. We thus will set out to conduct a systematic evaluation of each aspect of a mega-analysis of large IPD versus a meta-analysis of study-specific aggregated statistics. We will quantify the gain in the data quality and the ability to detect disease loci in a real large genome-wide IPD and will combine information from the real data set with simulation approaches to expand the scenarios. We will conduct the comparisons on three levels, the quality control, the imputation, and the modelling. One special focus will be rare variants.Based on our previous work, we are in the unique situation of having a large IPD at our hand (> 40.000 persons with and without age-related macular degeneration). This disease and our data provide an ideal role model, since the data set consists of > 12 million genetic variants including 160,000 that are rare protein-altering, and exhibits 34 detectable AMD loci. The results of this project will guide future genetic studies whether the challenge to gather IPD data for mega-analysis will be worth the effort and improve disease locus detection. The results of this project will also help understand how to analyse large multi-center GWAS where all data is available as IPD.
DFG Programme
Research Grants