Project Details
Projekt Print View

High-throughput genotyping of complex loci in humans and microbes

Subject Area Bioinformatics and Theoretical Biology
Immunology
Parasitology and Biology of Tropical Infectious Disease Pathogens
Term from 2018 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 405049479
 
Final Report Year 2024

Final Report Abstract

High-throughput DNA sequencing enables the characterization of microbial communities and pathogen transmission dynamics. In the DFG-funded research project “High-throughput genotyping of complex loci in humans and microbes”, we developed improved methods for characterizing microbial communities with long-read sequencing technologies; we also developed and demonstrated the utility of a system for genomics-based contact tracing during the SARS-CoV-2 pandemic. First, in a joint project with the Treangen Group at Rice University, we developed the “Emu” algorithm for the analysis of full-length 16S Oxford Nanopore sequencing data. The 16S ribosomal RNA gene is universally present in bacterial and archaeal genomes and an oft-utilized marker of community composition; by characterizing the 16S gene in its entire length, long-read-sequencing may increase the precision of 16S-based community composition analyses compared to short-read sequencing. Emu analyzes long-read data by combining an adaptive alignment likelihood model, measuring the similarity between individual reads and 16S reference sequences, with the Expectation Maximization (EM) algorithm for modeling overall community composition, improving the assignment of individual reads by borrowing information from the complete read set. In a series of validation experiments, we demonstrated that Emu is not only the most accurate algorithm for the analysis of Oxford Nanopore full-length 16S sequencing data, but also confirmed the potential superior accuracy of long-read-based 16S sequencing compared to short-read sequencing. Second, during the SARS-CoV-2 pandemic, we developed a system for the integrated genomic surveillance of viral transmission in the general population (in collaboration with Düsseldorf Health Authority, the Timm and Pfeffer groups, and commercial diagnostic labs). We implemented one of the highest-intensity viral genome sequencing programmes in Europe, leveraging the real-time sequencing capabilities of the Oxford Nanopore technology for rapid data generation; developed pipelines to continuously analyze incoming sequencing data and identify potential infection clusters in real time; and implemented interfaces to the systems of Düsseldorf Health Authority for the integration of sequencing-based analyses with classical contact tracing. The system enabled the detection and characterization of hundreds of otherwise undetectable infection clusters and chains; the characterization of large-scale viral population structure; quantifying the impact of individual travel-imported infections; and an analysis of infection contexts. Most importantly, we demonstrated both the feasibility and utility of genomics-based contact tracing in the general population during an ongoing pandemic, with important implications for pandemic preparedness.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung