Detailseite
Projekt Druckansicht

GRK 1906:  Informatische Methoden für die Analyse von Genomdiversität und -dynamik

Fachliche Zuordnung Grundlagen der Biologie und Medizin
Informatik
Mathematik
Förderung Förderung von 2013 bis 2018
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 221270173
 
Erstellungsjahr 2020

Zusammenfassung der Projektergebnisse

Enabled by modern high-throughput analytic biotechnologies, genomic research has moved from studying single genomes to the concurrent analysis of multiple genomes. In this International Research Training Group, we have developed new computational approaches targeting both (i) genome diversity, i.e., the variation between different samples, species, strains, individuals, cells, etc., and (ii) genomic dynamics originating from random mutations, recombination, evolutionary pressure and selection. Therefore we subdivided our research program into different areas addressing diverse methodological needs. In the context of Area 1 “Scale-up call: Enhancing computational capacity”, the method of choice has been to develop new tools within modern distributed IT environments. This way, high-performance computing becomes affordable and the algorithms are available close to the data. Within the IRTG, different approaches for scale up have been pursued. Containerisation of application (e.g. via Docker) lead to easy deployment in distributed computing infrastructures, integration into workflow systems and reproducible analyses. Integration of existing tools and “dockerized” applications into the MapReduce streaming framework allow robust distribution in cloud environments. For other application, algorithms have been natively implemented in the MapReduce framework. These approaches have been successfully shown to apply metagenomics workflows and publish reproducible results, to scale metagenomics analyses as well as comparative genome analyses. Research in Area 2 “Data management: Basic storage and retrieval” has focused on novel data structures that allow to efficiently store the sequences along with high-level meta-data. In particular, data structures for indexing and compressing pangenomes together with algorithms for their functional analysis have been developed. Furthermore, a data warehouse-driven online tool for metadata based studies of metagenomes has been developed. For the development of new algorithms and methods (Areas 3–5), different fields of application were addressed. Most notably, researchers of the IRTG developed algorithms for the computational determination of functional RNAs, for the efficient grouping and clustering of NGS data, for reconstructing ancestral genomes including ancient DNA, for the simulation of the mutation process along the ancestral line of populations under selection, for the prediction and visualization of 3D protein-protein networks to identify and analyse drug-drug interactions, for microfluidics time lapse image analysis and visualization, and for the visualization of molecular dynamics and co-location in MSI and polyomics data. The methodologies used reach from the design of models, algorithms and data structures to machine learning.

Projektbezogene Publikationen (Auswahl)

 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung