Project Details
Projekt Print View

Algorithmic Foundation for Genome Assembly: Streaming and External Memory Techniques

Subject Area Theoretical Computer Science
Term from 2014 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 255256381
 
De novo genome assembly is a fundamental task in life science. It requires the solution of combinatorial optimization problems in very large graphs requiring memory in terabyte range. The scientific progress is hampered by the fact that existing algorithms for such problems are not designed to cope with the big data issue in a memory and time efficient way. In the first two years of the projectwe have carried out the ground work for an innovative algorithmics for genome assembly in close cooperation with life science groups in Kiel. A key result is that new algorithms for substructures in large graphs, like long paths or Eulerian tours, can be computed in a memory-efficient way with streaming techniques. Furthermore, we presented an algorithm leading to a drastic reduction of assembly graphs, but they are still large and we are perpetually challenged by big data. In the second phase of the project we will extensivelystudy graph problems related to genome assembly with external-memory techniques and more powerful streaming models. This algorithmic research is in particular based on the building of an external-memory library of assembly graphs, which we are already compiling in cooperation with the SPP group in Frankfurt. Concerning the design of external memory and (assembly-)graph drawing algorithms we expect scientific progress and synergies in cooperation with other groups in the SPP, who are expert in these areas. Our memory-efficient algorithms will be implemented and applied to genome and transcriptome assembly graphs arising fromshotgun sequencing in life science groups in Kiel, e.g., marine ecology and clinical molecular biology.
DFG Programme Priority Programmes
 
 

Additional Information

Textvergrößerung und Kontrastanpassung