Project Details
Projekt Print View

High performance algorithms to perform up-to-date and scalable metagenomics analysis

Subject Area Bioinformatics and Theoretical Biology
Biological and Biomimetic Chemistry
Medical Informatics and Medical Bioinformatics
Term from 2021 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 458163427
 
Metagenomics enables the discovery and study of the collective genomic content from diverse environments through computational methods and analyses of data. The fast growth of public repositories of sequences greatly contributes to the success of metagenomics applications. However, those repositories are growing in a much faster pace than the resources to use them. This challenges current methods, which struggle to take full advantage of the massive and fast data generation. Additionally, reference selection and acquisition, taxonomy definition, statistical analysis and visualization pose further challenges to fully and properly explore environmental data. This project will provide a collection of interconnected methods to mitigate those issues, with a focus on the development of high performance algorithms and their implementation on metagenomics tools. The central goal of the project is to enable comparative metagenomics analysis in short time using the whole of the quickly growing number of assembled sequences openly available. At the same time, it will enable constant updates with the influx of new sequences. This will be achieved by implementing high performance algorithms coupled with efficient data structures to improve sequence indexing and classification. They will be developed on the top of state-of-the-art methods, extending their capacities to index and analyze very large sets of data, keeping or increasing precision and sensitivity of their final results. This algorithm is the core of a proposed workflow for high performance metagenomics analysis. Further, the workflow will enable reference sequence selection, acquisition and filtration. This is crucial to take full advantage of the currently under-explored data repositories due to the lack of metadata, presence of contamination and species over-representation. Additionally, the workflow will integrate an molecular-based taxonomy. This aims to resolve anomalies of current taxonomic definitions, improving the sensitivity of results. Finally, reports, visualizations and extended statistical analysis will be developed to translate raw results into an intelligible output for single- and multi-sample metagenomics studies. In summary, the workflow here proposed aims to improve the state-of-the-art on high performance metagenomics sequence classification and analysis, including data collection, extended reporting and a sequence-based taxonomy integration.
DFG Programme WBP Position
 
 

Additional Information

Textvergrößerung und Kontrastanpassung