Project Details
Multicollinearity in the statistical genomics era: Proposals to account for dependencies between molecular covariates with application to animal breeding
Applicant
Dr. Dörte Wittenburg
Subject Area
Animal Breeding, Animal Nutrition, Animal Husbandry
Term
from 2017 to 2020
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 363504750
In animal breeding, molecular data (e.g. single nucleotide polymorphisms; SNPs) are incorporated as predictor variables in statistical models to reach an improved genomic evaluation of animals. This leads to more precisely estimated breeding values of not-yet phenotyped animals, which is important for breeding purposes, and enables the genetic architecture of some traits to be elucidated. Not only is the effect size relevant but also the position on the genome. Particularly as high-dimensional SNP data are available, a causative variant can be pinpointed to a specific base pair on the genome. As the number of model parameters increases with a still growing number of SNPs, multicollinearity between covariates can affect the results of whole-genome regression methods. The objective of this study is to additionally incorporate dependencies between the molecular covariates, which are due to the linkage and linkage disequilibrium among chromosome segments, for more accurate estimates of SNP effects. The theoretical covariance between SNP genotypes can be used to filter the whole set of SNPs in order to remain at less but representative predictor variables. Furthermore, a joint approach is proposed that allows the simultaneous selection and shrinkage of relevant predictors. It is hypothesised that this method fulfils the requirements of genomic evaluation: the dependencies between SNPs are considered, smooth estimates are obtained within groups of highly correlated SNPs and the solution is sparse among and also within these groups. Thus, genomic regions that affect a trait can be identified.
DFG Programme
Research Grants