Effiziente Berechnung akkurater multipler struktureller RNA-Alignments mittels mathematischer Optimierung
Final Report Abstract
During the last years, the number of genes known for producing non-coding functional RNA has increased significantly, and it is assumed that many more of these ncRNA genes are still undiscovered. Yet, many functional classes of RNA show little sequence conservation, but rather a conserved secondary structure. A promising avenue to detecting new functional RNAs is thus to look for common structural features shared by a set of related sequences. With this project we intended to advance our recently developed algorithmic theory to compute reliable multiple structural alignments of potentially long RNA molecules under a reasonable consumption of computational resources. We employed methods from mathematical programming such as Lagrangian relaxation and solved the problem as an integer linear program resulting from a graph-theoretical reformulation. In the first phase of this project, we developed theory and algorithms based on mathematical optimization for pairwise and multiple structural alignment of RNA molecules and implemented the findings in freely available software tools (lara, plara, tlara). We demonstrated that our software tools show are among the top programs in terms of speed and alignment quality. In addition, we developed a fast tool to discover novel microRNAs. In the second phase of the project, which took place at Centrum Wiskunde & Informatica (CWI) in Amsterdam, we transferred the results to the area of protein structure alignment. Structural similarities between proteins are extremely common. This is for various reasons. First, there is only a limited number of overall structural arrangements, or so- called protein folds. Then, parts of proteins which constitute evolutionary entities, so-called domains, re-appear in many different proteins, because they are deleted, inserted, swapped, mutated, etc. Further, there is a high evolutionary pressure on conserving protein function and hence on conserving protein structure during evolution. Proteins which share a common ancestor, i.e., homologs, thus often have a similar structure. Finally, convergent evolution may lead to similar structures for similar functions. Because of this structure-function relationship, protein similarities help us to learn about the evolution and biological tasks of proteins and detecting them reliably is an importnat task. Our contribution to the structure alignment problem was two-fold. First, we cast the problem in mathematical models and designed exact algorithms to solve it. To this end we formulated general integer linear programs for inter-residue distance matrixbased structure alignment, which allowed us to classify many existing structure alignment approaches into a common framework. Finally, we solved these models to optimality by designing exact algorithms. Our second contribution was the application of our algorithms to problems that can only be tackled by the use of exact algorithms. For example, we computed provably better alignments, obtain alignment quality guarantees and accurately quantify protein similarities. Further, we evaluated heuristic algorithms and rigorously compared scoring schemes for protein structure alignment. Finally, we provided all these services to structural biologists via a web server.
Publications
-
Accelerated microRNA precursor detection using the Smith-Waterman algorithm on FPGAs, in Proc. of GCCB 2006 (International Workshop on Distributed, High-Performance, and Grid Computing in Computational Biology, Eilat, Israel, January 21-24, 2007), W. Dubitzky et al. (eds.), Lecture Notes in Bioinformatics, Vol. 4360, pp. 19-32, Springer, Berlin, 2007
Patrick May, Gunnar W. Klau, Markus Bauer, and Thomas Steinke
-
Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization. BMC Bioinformatics, Vol. 8, No. 271, BioMed Central, 2007
Markus Bauer, Gunnar W. Klau and Knut Reinert
-
An exact mathematical programming approach to multiple RNA sequence-structure alignment. Algorithmic Operations Research, Vol. 3, No. 2, 2008
Markus Bauer, Gunnar W. Klau and Knut Reinert
-
Towards optimal alignment of protein structure distance matrices. Bioinformatics 26(18):2273-2280, 2010
Inken Wohlers, Francisco S. Domingues, Gunnar W. Klau
-
CSA: Comprehensive comparison of pairwise protein structure alignments. Nucleic Acids Research, Vol. 40, Nr. W1, pp. 303 - 309, 2013
Inken Wohlers, Noel Malod-Dognin, Rumen Andonov, and Gunnar W. Klau
-
DALIX: Optimal DALI protein structure alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10(1):26-36, 2013
Inken Wohlers, Rumen Andonov, and Gunnar W. Klau