Project Details
Projekt Print View

Design, analysis, development and experimental validation of algorithms for high throughput sequencing mass data using the SeqAn library for biological sequence analysis

Subject Area Bioinformatics and Theoretical Biology
Term from 2010 to 2015
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 192954395
 
During the last five years modern sequencing technologies have brought a super-exponential growth of sequencing capacities. At the time of writing this proposal it is possible to sequence about 30 billion nucleotides per day using one sequencing machine. This proposal aims to respond to the described increase of genomic sequence data with algorithmic approaches that benefit from redundancies across multiple datasets. More specifically we aim at: 1) Developing a data structure representing one or more genomic sequences by storing only the differences to a similar reference sequence while maintaining the ability to navigate quickly in all sequences. We then us this data structure for developing algorithms to transform the substring index data structure of a reference to the substring index of a new genome without rebuilding it from scratch and by only storing the differences to the reference index. 2) Developing algorithms that efficiently process multiple genomes in parallel based on the representation developed in 1). 3) Bridging the gap between algorithm theory and practical implementations by extending SeqAn as a library providing the core algorithmic components required to analyze large-scale genomic data and as an experimental platform to design, analyze, and implement state-of-the-art bioinformatics algorithms.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung