Project Details
Projekt Print View

Deciphering protein-protein interactions via statistical boosting for multi-omics data

Subject Area Epidemiology and Medical Biometry/Statistics
Human Genetics
Term since 2025
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 574440500
 
Omics data, like genomics (DNA), transcriptomics (gene activity), and proteomics (proteins), are essential in genetic research because they provide multiple layers of biological information. Large biobanks like the UK Biobank and FinnGen collect these data from hundreds of thousands of people, including DNA, health records, and now protein data. Through genome-wide association studies (GWAS), millions of genotype-phenotype associations have been revealed and polygenic scores have been developed that can serve as a biomarker based on the individual genetic predisposition to stratify individuals. However, the majority of the considered genetic variants (single nucleotide polymorphisms, SNPs) lie in the non-coding region of the genome, making them difficult to interpret biologically. Proteomics adds valuable insight here, showing how protein levels change in disease, which can lead to better biomarkers. Analyzing multiple omics together can be done either by combining separate scores or by using integrative models that analyze them simultaneously. For example, protein quantitative trait loci (pQTL) studies connect genetic variants with protein levels. These can reveal how variants influence diseases and aid drug development. Recent studies suggest looking at interactions between proteins instead of single proteins alone. For example, analyzing pairs of proteins or protein ratios can better reflect how biological pathways work together and uncover new genetic links. However, analyzing large-scale genomic data is especially challenging because it involves large number of genetic variants (often p > 1,000,000) with a complex, highly correlated structure (linkage disequilibrium). This makes statistical modeling and variable selection computationally demanding. The proposed project aims to address this by developing a novel statistical boosting algorithm that models, in a multivariate framework, how two proteins and their interaction are influenced by genetics. The method will expand snpboost, a statistical boosting method for genetic data that I developed for single outcomes, to fit a bivariate normal distribution for the two proteins while efficiently handling the high dimensionality and correlation structure of genomic data. Relevant variants will be selected separately for each parameter, allowing researchers to pinpoint which variants affect each protein alone and which affect their combined action. This detailed insight will improve our understanding of disease mechanisms, support the development of robust biomarkers, and potentially identify new drug targets.
DFG Programme WBP Fellowship
International Connection Finland
 
 

Additional Information

Textvergrößerung und Kontrastanpassung