On the pros and cons of mega-analyses compared to meta-analyses for genome-wide association studies
Final Report Abstract
Huge sample sizes are required to conduct genome-wide association studies (GWAS) due to the multiple testing burden of testing millions of genetic variants. Therefore, GWAS are usually conducted as meta-analyses across numerous studies. A usual approach is that study-specific summary statistics are combined by “meta-analysis". We posed the question whether disease locus detection could be improved by collecting individual participant data (IPD) from all studies and by analyzing it as one huge data set (mega-analysis) instead of conducting a meta-analysis. Mega-analysis includes several steps with imputation being the computationally most costly. We thus investigated, whether mega-imputation (i.e. imputation across all studies) has a benefit over meta-imputation (i.e. imputation per study, substantially less computational costs). We also investigated whether the association analysis of the mega- or meta-imputed data by differently complex models induces bias or benefits. For this work, we had unique data from numerous studies on age-related macular degeneration (AMD) at our hand as IPD, to exemplify pros and cons. AMD is an ideal role model due to numerous known genetic loci with large genetic effects on AMD. Our work involving numerous runs of imputation, separating the phasing from the variant-imputation step, and analyses by a suite of statistical models, yielded several important findings. Briefly, we showed that mega-imputation yields more well-imputed variants, particularly rare variants. If common variants are the main focus, the gain is limited. We also found that meta-imputation requires statistical analyses accounting for potential study-related confounder, if known, and we recommend a meta-analysis to avoid confounding by unknown study-specific characteristics. While mega-analysis with a simple model un-accounting for study can better identify disease loci, genetic effect estimation benefits from a model accounting for study. We followed these recommendations in our work on GWAS for early stages of AMD: we did not mega-impute, despite having most of the multi-study data in our hand as IPD, which substantially accelerated this work initially planned as mega-analysis. This work provided important insights into AMD genetic variants that trigger early AMD versus those that only triggered late AMD, suggesting mechanisms for progression. The methodological work on scrutinizing genetic effect estimation by appropriate statistical modelling and the work on early AMD motivated the next related work: we classified early and late AMD in 60,000 individuals of UK Biobank based on > 170,000 fundus images via a machine learning approach. Such automated classification approaches are be pivotal to facilitate AMD research work in large-scale multi-site mega-data, e.g. from NAKO. We used GWAS to quality control this machine learning derived disease classification incorporating confusion matrix information from >2000 manually graded individuals. We found relevant uncertainty by the machine learning derived classification. Our developed statistical approach to account for this uncertainty highlighted true AMD loci, but also a pseudo-signal coding for eye color. Overall, our project was very successful in answering the questions posed in the project proposal. Developed software and pipelines are made available as open source. The results are highly relevant from a methodological perspective to help guide future study design and analyses in GWAS on large scale.
Publications
- On the differences between mega- and meta-imputation and analysis exemplified on the genetics of age-related macular degeneration. Genet Epidemiol 2019;43(5):559-76
Gorski M, Guenther F, Winkler TW, Weber BHF, Heid IM
(See online at https://doi.org/10.1002/gepi.22204) - Chances and challenges of machine learning-based disease classification in genetic association studies illustrated on age-related macular degeneration. Genet Epidemiol 2020; 44 (7):759-777
Guenther F, Brandl C, Winkler TW, Wanner V, Stark K, Kuechenhoff H, Heid IM
(See online at https://doi.org/10.1002/gepi.22336) - Genome-wide association meta-analysis for early age-related macular degeneration highlights novel loci and insights for advanced disease. BMC Med Genomics 2020;13(1):120
Winkler TW, Grassmann F, Brandl C, Kiel C, Guenther F, Strunz T, Weidner L, Zimmermann ME, Korb CA, Poplawski A, Schuster AK, Muller-Nurasyid M, Peters A, Rauscher FG, Elze T, Horn K, Scholz M, Canadas-Garre M, McKnight AJ, Quinn N, Hogg RE, Kuchenhoff H, Heid IM, Stark KJ, Weber BHF
(See online at https://doi.org/10.1186/s12920-020-00760-7)