Detailseite
Projekt Druckansicht

FlipCut supertrees: Computing larger, better phylogenies faster

Fachliche Zuordnung Bioinformatik und Theoretische Biologie
Förderung Förderung von 2012 bis 2016
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 211926079
 
Erstellungsjahr 2017

Zusammenfassung der Projektergebnisse

Computing phylogenies commonly leads to NP-hard optimization problems. Local search heuristics that are used to solve these optimization problems do not scale well for large numbers of taxa. The goal of this project was to improve the swift polynomial-time supertree method FlipCut, to reach better supertree quality than the state-of-the-art supertree method MRP, which relies on a local search heuristic. Furthermore, we wanted to use our method to improve quality and scalability of sequence-base phylogenetic methods. We changed the objective function from Minimum Flipping (FlipCut supertrees) to Minimum Character Deletion (BCD supertrees) by using a specic weighting of the FlipCut graph. This new objective function has a more intuitive biological explanation, namely deleting as few clades from the input trees as possible to make them compatible. Furthermore, BCD allows a straightforward integration of meta information. We found that using bootstrap values signicantly improves supertree quality. We developed a preprocessing strategy for BCD that further improves quality and speed. We used the BCD supertree as a starting tree for a local search heuristic to boost a Maximum Likelihood supermatrix analysis. Finally, we developed a beam search modication for BCD that further improves supertree quality. We are now able to "compute larger, better phylogenies faster", as suggested in the title of our application: • We created a large-scale simulated dataset for supertree evaluation, with an average number 5500 taxa and 305 input trees. The evaluation on this dataset clearly demonstrated the benet of BCD compared to other methods: For example, MRP did not nish a single instance within two weeks, whereas BCD required between 4 h and 8 h of running time. • For the simulated datasets, BCD supertrees showed better F1 scores than all other evaluated supertree methods. Furthermore, used as a starting tree for a Maximum Likelihood supermatrix analysis, the BCD supertree signicantly improved the overall tree quality. • In our exhaustive evaluation, BCD was clearly the fastest supertree method. It is the only supertree method with guaranteed polynomial running time that can, with regards to supertree quality, compete with supertree methods based on Parsimony. In addition, it reduced the running time of a Maximum Likelihood supermatrix analysis when used as a starting tree. We have given a prove-of-concept that dividing the input data can improve the quality of sequence based tree estimation. Citing from our application, "By combining the FlipCut method with state-of-the-art programs for phylogenetic inference such as RAxML, we want to create a method for phylogenetic inference that is both faster and more accurate than, say, RAxML alone." We argue that this goal has indeed been acchieved, although our proof is only annecdotal at this point of time. BCD supertrees can nevertheless become a useful tool for "everyday" phylogenetic reconstruction, given the swiftness and the quality of the resulting supertrees. For that, BCD supertrees have several convenient features (polynomial running time, high quality, direct integration of condence values, no novel clades). Using BCD as part of divide-and-conquer strategy inspired by DACTAL is a promising idea for future research, and may allow us to apply maximum likelihood-based tree reconstruction methods for datasets with thousands of taxa.

Projektbezogene Publikationen (Auswahl)

 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung