Project Details
Projekt Print View

High performance algorithms to perform up-to-date and scalable metagenomics analysis

Subject Area Bioinformatics and Theoretical Biology
Biological and Biomimetic Chemistry
Medical Informatics and Medical Bioinformatics
Term from 2021 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 458163427
 
Final Report Year 2023

Final Report Abstract

The study of the collective complete genomic content from organisms of a specific environment bypassing the cultivation of clonal cultures is known as Metagenomics. Analyzing communities as a whole is not trivial due to its complexity, high diversity and the limits of available technologies translating genomic content into comprehensible data. One of the most critical aspects is data complexity and size. The fast and constant growth of public repositories of sequences greatly contributes to the success of metagenomics applications. However, they are growing in a much faster pace than the resources to properly use them. This challenges computational methods, which struggle to take full advantage of the massive and fast data generation. In this project I implemented high performance algorithms, libraries and pipelines to facilitate and make better use of the constantly growing data for metagenomics applications. The main contribution is the ganon2 software, an alignment-free genomic sequence classifier that presents a generational leap in performance and usability from its first version and state-of-the-art competitors. ganon2 achieves very good performance indexing and querying large sets of genomic data, enabling the use of more diverse and comprehensive references for microbiome studies, improving the resolution of results. Not only the performance but most importantly the outcome in terms of sensitivity and precision are highly improved, not only by using up-to-date references, but also due to new software features and optimizations. Additionally, a set of tools and pipelines were improved and newly developed to support the goals of this project and are integrated to ganon2, but also as standalone open-source contributions: genome_updater, a script to download and update any set of sequences from NCBI. MultiTax, a python package to obtain, parse and explore several biological taxonomies. GRIMER: visual analysis of microbiome studies with a portable and interactive dashboard. MetaBench: a complete pipeline to run and benchmark tools, evaluating their performance with an interactive visualization of metrics. Those tools are interconnected and form an improved metagenomics analysis workflow.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung