Project Details
High-throughput genotyping of complex loci in humans and microbes
Applicant
Professor Dr. Alexander Dilthey
Subject Area
Bioinformatics and Theoretical Biology
Immunology
Parasitology and Biology of Tropical Infectious Disease Pathogens
Immunology
Parasitology and Biology of Tropical Infectious Disease Pathogens
Term
from 2018 to 2024
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 405049479
Next-generation sequencing (NGS) has emerged as a fundamental tool for biomedical research and has enabled the characterization of hundreds of thousands of individual human and microbial genomes. Many genomic features of crucial importance for fundamental research, biotechnology, and human health, however, remain inaccessible to full NGS-based bioinformatic reconstruction. This includes the most important immunogenetic regions of the human genome: the Major Histocompatibility Complex (MHC), encoding especially the Human Leukocyte Antigen (HLA) genes, and the Leukocyte Receptor Complex (LRC), encompassing the Killer-cell immunoglobulin-like receptor (KIR) genes. It also includes the haplotype structures associated with antibiotic and innate resistance as well as mobile elements in microbes.The underlying reason is that standard analysis algorithms for NGS data rely on strong sequence homology to the canonical reference genome. This assumption is violated by the complex haplotype structures associated with immunogenetic regions, mobile elements and non-homologous recombination in general.It is now well accepted that genome graphs, a variation-aware approach for the analysis of sequencing data pioneered by us and others, play an important role in the genotyping of these genomic regions. Recent progress notwithstanding, however, comprehensive genotyping of MHC and LRC remains an open problem. Genome graphs are also well-suited to support NGS-based genotyping of microbial genomes, in particular with respect to antimicrobial resistance and in metagenomic settings.Therefore the following work programme is proposed:a) Development of algorithms for the complete and integrated characterization of the MHC region from whole-genome Illumina data with an integrated model for HLA and non-HLA variation in the MHC;b) Development of algorithms for the complete characterization of the LRC/KIR region from whole-genome Illumina data, including the discovery of novel haplotype structures;c) Development of algorithms for real-time Nanopore-based microbial species/strain identification and resistance prediction, supporting isolate and mixed/metagenomic samples. Long reads and real-time sequencing make the Nanopore technology particularly suitable for metagenomic applications and potential future translation into diagnostics.To evaluate the developed methods for human genetics, whole-genome Illumina data from public databases will be used. To evaluate the microbial identification/typing system, Nanopore sequencing data from environmental and biobanked samples will be generated at the BMFZ of the University of Düsseldorf.In summary, we’ll develop improved algorithms for genome graphs and complex regions, unlocking two of the fast evolving, biomedically and scientifically most important regions of the human genome. We’ll also extend genome graphs to microbes and develop powerful bioinformatics tools for the analysis of real-time sequencing data.
DFG Programme
Research Grants