Scalable Algorithms for Reconstruction of Plant Phylogenies in Conjunction with the NSF (National Science Foundation) iPlant Collaborative

Applicant Professor Dr. Alexandros Stamatakis

Subject Area Plant Biochemistry and Biophysics

Term from 2009 to 2012

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 145491060

Final Report Year 2013

Final Report Abstract

In this project our main goal was to develop the software required to infer the largest Maximum Likelihood based phylogeny of plants, to date. Albeit smaller than initially planned, mainly for reasons of data assembly and availability we were able to infer the largest plant phylogeny to date. We also showed that such large trees are accurate enough for conducting meaningful post-analyses such as the identiﬁcation of diversiﬁcation rate shifts. In addition, we have developed and made available the RAxML-Light software that allows for inferring huge trees, both, with respect to the number of taxa and number of genes, on supercomputing systems using the Message Passing Interface (MPI). RAxML-Light is currently the only production-level software that oﬀers a ﬁne-grain MPI-based parallelization of the PLF for inferring huge trees on distributed memory systems. The checkpointing feature represents a prerequisite for deploying such codes on typical cluster and supercomputer conﬁgurations. We have also conducted low-level technical work by exploiting the features of modern hardware using 256-bit wide AVX vector instructions. This allows for optimally exploiting available hardware resources on modern servers. We have also shown that, the code can be used to infer trees comprising more than 100,000 taxa. The tree search convergence criterion we have developed shows good performance and avoids unnecessary likelihood calculations in the asymptotic convergence phase of the tree search. The criterion has been integrated as standard program option in RAxML and RAxML-Light. Using single- instead of double-precision ﬂoating point arithmetics did not work well, because single-precision arithmetics led to a high degree of numerical instability which appears to be impossible to resolve. In addition to this, we have explored novel tree search techniques and developed two novel mechanisms for reducing the memory footprints of likelihood-based analyses. The two orthogonal techniques (SEVs and recomputation) are generic, that is, they can be deployed in all likelihoodbased programs (ML and Bayesian inference codes). We are convinced that RAM requirements will constitute a limiting factor for future large-scale phylogenetic analyses. A currently still on-going project is the development of the perpetual tree pipeline that can be used to maintain an up-to-date large comprehensive reference tree containing all taxa of a speciﬁc taxonomic rank. This framework can be used for maintaining perpetually updated trees for any kind of organismal group. We anticipate to make this publicly available by mid 2013. In the ﬁnal analysis, we have developed and made available an open source code that allows for analyzing datasets that are one order of magnitude larger than prior to this project. Moreover, we have shown that so-called mega-phylogenies inferred with RAxML-Light on real biological data are plausible and accurate. Finally, we have developed and published novel, generally applicable, methods for reducing the number of ﬂoating point operations and the memory requirements of likelihood calculations.

Publications

Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees. BMC Bioinformatics, 12(1):470, 2011
F. Izquierdo-Carrasco, S.A. Smith, and A. Stamatakis
Computing the phylogenetic likelihood function out-of-core. In In Proceedings of the IPDPS-HiCOMB 2011, 2011
F. Izquierdo-Carrasco and A. Stamatakis
Understanding angiosperm diversiﬁcation using small and large phylogenetic trees. American Journal of Botany, 98(3):404–414, March 2011
Stephen A. Smith, Jeremy M. Beaulieu, Alexandros Stamatakis, and Michael J. Donoghue
”Computing the Phylogenetic Likelihood Function Out-of-Core”, IEEE HICOMB 2011 workshop (held in conjunction with IPDPS 2011), Anchorage, USA, May 2011
Fernando Izquierdo-Carrasco
RAxML-Light: a tool for computing terabyte phylogenies. Bioinformatics, 28(15):2064–2066, 2012
Stamatakis, A.; Aberer, A.J.; Goll, C.; Smith, S.A.; Berger, S.A. & Izquierdo-Carrasco, F.
Trading running time for memory in phylogenetic likelihood computations. In Jan Schier, Carlos Manuel B. A. Correia, Ana L. N. Fred, and Hugo Gamboa, editors, BIOINFORMATICS, pages 86–95. SciTePress, 2012
Fernando Izquierdo-Carrasco, Julien Gagneur, and Alexandros Stamatakis
”Inference of Huge Trees under Maximum Likelihood”, IPDPS PhD forum, Shanghai, China, May 2012
Fernando Izquierdo-Carrasco, Alexandros Stamatakis
”Trading Memory for Running Time in Phylogenetic Likelihood Computations”, 2012 Bioinformatics conference, Vilamoura, Portugal, February 2012
Fernando Izquierdo-Carrasco

Servicenavigation

Hauptnavigation

Scalable Algorithms for Reconstruction of Plant Phylogenies in Conjunction with the NSF (National Science Foundation) iPlant Collaborative

Final Report Abstract

Publications

Additional Information

Servicenavigation

Hauptnavigation

Scalable Algorithms for Reconstruction of Plant Phylogenies in Conjunction with the NSF (National Science Foundation) iPlant Collaborative

Final Report Abstract

Publications

Additional Information

Textvergrößerung und Kontrastanpassung