Project Details
Projekt Print View

Lost in Tree Space (LiTS)

Subject Area Bioinformatics and Theoretical Biology
Theoretical Computer Science
Term from 2016 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 295143677
 
We propose to conduct research on two phenomena that can get us lost in tree space when conducting phylogenetic inferences. One is that of gene tree versus species tree discordance that requires reconciliation and the second phenomenon is the existence of terraces in tree spaces requiring further scrutiny. Thus, our overarching goal is to conduct research to better understand why we are lost in tree space and how we can better navigate through tree space in a more targeted as well as computationally efficient manner. The specific projects, we propose build upon the highly successful collaboration between the two labs from the two preceding funding periods as well as on the respective experience accumulated by the junior researchers that were funded through the preceding grant. More specifically, we will develop methods and algorithms and make them available as open source tools to (i) sample, enumerate, and summarize trees residing on a terrace in tree space, (ii) more efficiently search tree space and evaluate tree topologies in the presence of terraces under maximum likelihood and parsimony, and (iii) conduct scalable, efficient, and accurate gene tree species tree reconciliations. Biological significance: The biological significance of our work is underlined by the fact that only a handful of easy to use likelihood-based gene tree species tree reconciliation tools exist. Despite the fact, that we only have a prototype implementation of GeneRax available at present that lacks numerous desirable features, it is already being used by some early adopters. Given the large user base of RAxML-NG and IQ-TREE, every improvement in their search efficiency means that thousands of CPU hours can be saved. In addition, as shown in (Dobrin, Zwickl, and Sanderson 2018) a plethora of current phylogenomic datasets contains terraces. In other words, this is not an exotic theoretical property of search spaces, but a real problem with empirical data that needs to be addressed and better studied. If our initial findings on quasi-terraces are confirmed the existence of terraces will affect a substantially larger fraction of empirical phylogenetic analyses as the occurrence of terrace-like structures will not depend on a specific branch linkage model. As terraces occur in the presence of missing sequences one could assume that sequencing complete genomes would solve the problem entirely. However, this is not the case. Since biological diversity of species and gene deletions are responsible for the fact that not all of the genes are present in all organisms, missing sequences are inherent property of large phylogenomic alignments. Therefore, missing data remains an important issue to be systematically accounted for by phylogenomic software. This is specially important if we attempt to resolve the Tree of Live comprising extremely diverse species with genomes containing different collections of genes.
DFG Programme Research Grants
International Connection Austria
Cooperation Partner Professor Dr. Arndt von Haeseler
 
 

Additional Information

Textvergrößerung und Kontrastanpassung