Project Details
Projekt Print View

Models, Algorithms, and High Performance Computing for Phylogenetic Inference: Towards Simultaneous Alignment and Tree Building with Maximum Likelihood

Subject Area Bioinformatics and Theoretical Biology
Term from 2007 to 2013
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 59430316
 
Final Report Year 2014

Final Report Abstract

In this project our main goal was to develop scalable, efficient, and parallel software for analyzing molecular and morphological data in an evolutionary context. In other words, our goal was to enable research in evolutionary biology by developing novel and faster algorithms and tools for analyzing the ever-growing datasets. Parallel computing aspects became substantially more important with the emergence of NGS data during the course of the project. While the initial goal to develop a simultaneous tree building and alignment tool proved to be too ambitious, we focused on the slightly easier problem of extending given alignments by short reads based on their phylogenetic signal. Together with the competitors that developed pplacer we have pioneered the field of phylogeny-aware analyses of short reads. While we contributed to developing a simultaneous alignment and tree building tool using the interleaved approach where one alternates phases of tree inference with alignment refinement, the PI is not entirely convinced that this approach represents an optimal solution. Note that, current ’truly’ simultaneous Bayesian alignment and inference tools such as BaliPhy are only able to handle alignments of up to 50 or 100 taxa, which are considered small nowadays. It is unclear at present, if it will be feasible to ever build such a tool given current methods and computing capacity. With respect to phylogenetic inference, we have made contributions to improving search algorithms, substantially extended RAxML and established it as a standard tool in phylogenetics. Moreover, we have proposed a plethora of algorithmic and technical means to accelerate the calculation of the phylogenetic likelihood function that accounts for 90-95% of total run time in every likelihood-based phylogenetic inference tool, be it Bayesian or maximum likelihood-based. It is important to note that, most concepts we developed, are not RAxML-specific, but can be applied to any likelihood-based phylogenetic inference program. We have also broadened our research focus by work on ’classic’ NGS sequence analysis such as a read mapper library, development of discrete algorithms for the post-analysis of phylogenetic trees, and work on extending and accelerating population genetics codes. The large-scale collaborative analyses in collaboration with biologists helped us to identify new problems and new directions of research that take into account the needs of the community. An indirect result of the grant is the establishment of the ’computational molecular evolution’ summer school series that will take place for the 6th time in Heraklion, Crete in 2014. In the final analysis, we have developed and made available a plethora of open source codes for phylogenetics, sequence analysis, and population genetics that allow for analyzing datasets that are at least one order of magnitude larger than prior to this project.

Publications

  • A rapid bootstrap algorithm for the raxml web servers. Systematic biology, 57(5):758–771, 2008
    Alexandros Stamatakis, Paul Hoover, and Jacques Rougemont
  • A generic and versatile architecture for inference of evolutionary trees under maximum likelihood. In Signals, Systems and Computers (ASILOMAR), 2010 Conference Record of the Forty Fourth Asilomar Conference on, pages 829–835. IEEE, 2010
    Nikolaos Alachiotis and Alexandros Stamatakis
  • Assessment of barrier implementations for finegrain parallel regions on current multi-core architectures. In Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on, pages 1–8. IEEE, 2010
    Simon A Berger and Alexandros Stamatakis
  • How many bootstrap replicates are necessary? Journal of Computational Biology, 17(3):337–354, 2010
    Nicholas D Pattengale, Masoud Alipour, Olaf RP Bininda-Emonds, Bernard ME Moret, and Alexandros Stamatakis
  • Mltreemapaccurate maximum likelihood placement of environmental dna sequences into taxonomic and functional reference phylogenies. BMC genomics, 11(1):461, 2010
    Manuel Stark, Simon Berger, Alexandros Stamatakis, and Christian von Mering
  • Aligning short reads to reference alignments and trees. Bioinformatics, 27(15):2068–2075, 2011
    Simon A Berger and Alexandros Stamatakis
  • Fpga optimizations for a pipelined floatingpoint exponential unit. Reconfigurable Computing: Architectures, Tools and Applications, pages 316–327, 2011
    Nikolaos Alachiotis and Alexandros Stamatakis
  • Morphology-based phylogenetic binning of the lichen genera graphis and allographa (ascomycota: Graphidaceae) using molecular site weight calibration. Taxon, 60(5):1450–1457, 2011
    Simon A Berger, Alexandros Stamatakis, and Robert Lucking
  • Metagenomic species profiling using universal phylogenetic marker genes. Nature methods, 10(12):1196–1199, 2013
    Shinichi Sunagawa, Daniel R Mende, Georg Zeller, Fernando Izquierdo-Carrasco, Simon A Berger, Jens Roat Kultima, Luis Pedro Coelho, Manimozhiyan Arumugam, Julien Tap, Henrik Bjørn Nielsen, et al.
    (See online at https://doi.org/10.1038/NMETH.2693)
  • Sweed: Likelihood-based detection of selective sweeps in thousands of genomes. Molecular Biology and Evolution, 2013
    Pavlos Pavlidis, Daniel Živković, Alexandros Stamatakis, and Nikolaos Alachiotis
    (See online at https://doi.org/10.1093/molbev/mst112)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung