Detailseite
Projekt Druckansicht

Design, analysis, development and experimental validation of algorithms for high throughput sequencing mass data using the SeqAn library for biological sequence analysis

Fachliche Zuordnung Bioinformatik und Theoretische Biologie
Förderung Förderung von 2010 bis 2015
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 192954395
 
During the last five years modern sequencing technologies have brought a super-exponential growth of sequencing capacities. At the time of writing this proposal it is possible to sequence about 30 billion nucleotides per day using one sequencing machine. This proposal aims to respond to the described increase of genomic sequence data with algorithmic approaches that benefit from redundancies across multiple datasets. More specifically we aim at: 1) Developing a data structure representing one or more genomic sequences by storing only the differences to a similar reference sequence while maintaining the ability to navigate quickly in all sequences. We then us this data structure for developing algorithms to transform the substring index data structure of a reference to the substring index of a new genome without rebuilding it from scratch and by only storing the differences to the reference index. 2) Developing algorithms that efficiently process multiple genomes in parallel based on the representation developed in 1). 3) Bridging the gap between algorithm theory and practical implementations by extending SeqAn as a library providing the core algorithmic components required to analyze large-scale genomic data and as an experimental platform to design, analyze, and implement state-of-the-art bioinformatics algorithms.
DFG-Verfahren Sachbeihilfen
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung