Models, Algorithms, and High Performance Computing for Phylogenetic Inference: Towards Simultaneous Alignment and Tree Building with Maximum Likelihood

Applicant Professor Dr. Alexandros Stamatakis

Subject Area Bioinformatics and Theoretical Biology

Term from 2007 to 2013

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 59430316

Final Report Year 2014

Final Report Abstract

In this project our main goal was to develop scalable, eﬃcient, and parallel software for analyzing molecular and morphological data in an evolutionary context. In other words, our goal was to enable research in evolutionary biology by developing novel and faster algorithms and tools for analyzing the ever-growing datasets. Parallel computing aspects became substantially more important with the emergence of NGS data during the course of the project. While the initial goal to develop a simultaneous tree building and alignment tool proved to be too ambitious, we focused on the slightly easier problem of extending given alignments by short reads based on their phylogenetic signal. Together with the competitors that developed pplacer we have pioneered the ﬁeld of phylogeny-aware analyses of short reads. While we contributed to developing a simultaneous alignment and tree building tool using the interleaved approach where one alternates phases of tree inference with alignment reﬁnement, the PI is not entirely convinced that this approach represents an optimal solution. Note that, current ’truly’ simultaneous Bayesian alignment and inference tools such as BaliPhy are only able to handle alignments of up to 50 or 100 taxa, which are considered small nowadays. It is unclear at present, if it will be feasible to ever build such a tool given current methods and computing capacity. With respect to phylogenetic inference, we have made contributions to improving search algorithms, substantially extended RAxML and established it as a standard tool in phylogenetics. Moreover, we have proposed a plethora of algorithmic and technical means to accelerate the calculation of the phylogenetic likelihood function that accounts for 90-95% of total run time in every likelihood-based phylogenetic inference tool, be it Bayesian or maximum likelihood-based. It is important to note that, most concepts we developed, are not RAxML-speciﬁc, but can be applied to any likelihood-based phylogenetic inference program. We have also broadened our research focus by work on ’classic’ NGS sequence analysis such as a read mapper library, development of discrete algorithms for the post-analysis of phylogenetic trees, and work on extending and accelerating population genetics codes. The large-scale collaborative analyses in collaboration with biologists helped us to identify new problems and new directions of research that take into account the needs of the community. An indirect result of the grant is the establishment of the ’computational molecular evolution’ summer school series that will take place for the 6th time in Heraklion, Crete in 2014. In the ﬁnal analysis, we have developed and made available a plethora of open source codes for phylogenetics, sequence analysis, and population genetics that allow for analyzing datasets that are at least one order of magnitude larger than prior to this project.

Publications

A rapid bootstrap algorithm for the raxml web servers. Systematic biology, 57(5):758–771, 2008
Alexandros Stamatakis, Paul Hoover, and Jacques Rougemont
A generic and versatile architecture for inference of evolutionary trees under maximum likelihood. In Signals, Systems and Computers (ASILOMAR), 2010 Conference Record of the Forty Fourth Asilomar Conference on, pages 829–835. IEEE, 2010
Nikolaos Alachiotis and Alexandros Stamatakis
Assessment of barrier implementations for ﬁnegrain parallel regions on current multi-core architectures. In Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on, pages 1–8. IEEE, 2010
Simon A Berger and Alexandros Stamatakis
How many bootstrap replicates are necessary? Journal of Computational Biology, 17(3):337–354, 2010
Nicholas D Pattengale, Masoud Alipour, Olaf RP Bininda-Emonds, Bernard ME Moret, and Alexandros Stamatakis
Mltreemapaccurate maximum likelihood placement of environmental dna sequences into taxonomic and functional reference phylogenies. BMC genomics, 11(1):461, 2010
Manuel Stark, Simon Berger, Alexandros Stamatakis, and Christian von Mering
Aligning short reads to reference alignments and trees. Bioinformatics, 27(15):2068–2075, 2011
Simon A Berger and Alexandros Stamatakis
Fpga optimizations for a pipelined ﬂoatingpoint exponential unit. Reconﬁgurable Computing: Architectures, Tools and Applications, pages 316–327, 2011
Nikolaos Alachiotis and Alexandros Stamatakis
Morphology-based phylogenetic binning of the lichen genera graphis and allographa (ascomycota: Graphidaceae) using molecular site weight calibration. Taxon, 60(5):1450–1457, 2011
Simon A Berger, Alexandros Stamatakis, and Robert Lucking
Metagenomic species proﬁling using universal phylogenetic marker genes. Nature methods, 10(12):1196–1199, 2013
Sunagawa, Shinichi; Mende, Daniel R; Zeller, Georg; Izquierdo-Carrasco, Fernando; Berger, Simon A; Kultima, Jens Roat; Coelho, Luis Pedro; Arumugam, Manimozhiyan; Tap, Julien; Nielsen, Henrik Bjørn; Rasmussen, Simon; Brunak, Søren; Pedersen, Oluf; Guarner, Francisco; de Vos, Willem M; Wang, Jun; Li, Junhua; Doré, Joël; Ehrlich, S Dusko; ... & Bork, Peer
Sweed: Likelihood-based detection of selective sweeps in thousands of genomes. Molecular Biology and Evolution, 2013
Pavlidis, Pavlos; Živković, Daniel; Stamatakis, Alexandros & Alachiotis, Nikolaos

Servicenavigation

Hauptnavigation

Models, Algorithms, and High Performance Computing for Phylogenetic Inference: Towards Simultaneous Alignment and Tree Building with Maximum Likelihood

Final Report Abstract

Publications

Additional Information

Servicenavigation

Hauptnavigation

Models, Algorithms, and High Performance Computing for Phylogenetic Inference: Towards Simultaneous Alignment and Tree Building with Maximum Likelihood

Final Report Abstract

Publications

Additional Information

Textvergrößerung und Kontrastanpassung