A comparative approarch to genome annotation in Tribolium
Final Report Abstract
This project’s aims and also its achievements can be separated in two parts. Firstly, we provided bioinformatics support that was necessary for the other applied projects of the iBeetle research group to be successful and effective. Secondly, we developed a new type of bioinformatics tool – comparative AUGUSTUS – that can be generally applied to annotate one or more eukaryotic genomes when closely related genomes – annotated or not – are available. Comparative AUGUSTUS was in particular applied to four Tribolium species to assess remaining errors in the annotation of T. castaneum. Bioinformatics support of the screen. The RNA interference screen required the construction of a library of double stranded RNAs from genes that were then be knocked down. In this project, we identified and selected the genes for the screen. As the state-of-the-art in eukaryotic genome annotation is such that a non-negligible number of errors in the annotation were to be expected, such as the prediction of false exon boundaries, the prediction of false positive exons or the false splitting or joining of genes, we simultaneously improved the previous annotation. Further, as often for some parts of a gene its exon-intron structure is well-supported with evidence while other parts may be uncertain, we selected substrings of the mRNA that were sufficiently long, but also very likely to be correct. We improved the previous annotation, called official gene set (OGS2), to a new annotation on the basis of extensive RNA sequencing data. When in doubt about new alternatives we gave precedence to OGS2 genes in order to keep the annotation – that had been used by many – stable unless clearly indicated. The new annotation of T. castaneum contains 1452 new genes that do not overlap any gene in OGS2 and numerous corrections to previously existing gene models. This annotation is the first to include alternative splicing and untranslated regions and is in significantly better agreement with experimental data from next generation transcriptome sequencing that had not been available at the time when the previous annotation was performed. The new gene models were not only a basis for the screen in general but also for three other publications from the research group. Development of new methods for comparative genome annotation. Even although the speedup of genome sequencing has led to situations in which frequently a whole clade of related genomes needs to be annotated, such as the case with Tribolium castaneum, Tribolium freemani, Tribolium madens and Tribolium confusum, almost all annotation methods are such that they annotate one genome at a time. It is true that comparative methods had been developed that exploit the alignment with one or several other genomes while annotating a target genome, the methods that were capable of considering more than two input genomes would yet still only consider the gene structures in one of them. Applications to larger clades do not scale well with growing clade size and their practical dissemination was limited despite demonstrated good accuracy. Within this project, components of the new comparative extension of the gene-finder AUGUSTUS have been developed. Among them a new Bayesian codon selection model and the program ESPOCA for the classification of sequence into coding versus noncoding from a multiple genome alignment. Comparative AUGUSTUS is annotating all genomes simultaneously and can consider evidence such as an existing annotation of one genome and RNA-Seq evidence for any subset of the genomes. When tested on a clade of 12 Drosophilas, the coding bases not found by AUGUSTUS could be reduced from ∼ 6.6% in the single-genome version to ∼ 3.2% in the new comparative version. Further, comparative AUGUSTUS is able to map the existing annotation of one genome to another related genome (e.g. between cow and mouse) significantly more accurately as the common used methods based on the alignment of protein sequences to a genome. Both the comparative version of AUGUSTUS and the new tool ESPOCA are ongoing developments and their final performance and pratical impact yet to be established.
Publications
-
The iBeetle large-scale RNAi screen reveals gene functions for insect development and physiology. Nature Communications, 6, 2015
Christian Schmitt-Engel, Dorothea Schultheis, Jonas Schwirz, Nadi Ströhlein, Nicole Troelenberg, Upalparna Majumdar, Daniela Grossmann, Tobias Richter, Maike Tech, Jürgen Dönitz, et al.
-
iBeetle-Base: a database for RNAi phenotypes in the red flour beetle Tribolium castaneum. Nucleic Acids Research, page gku1054, 2014
Jürgen Dönitz, Christian Schmitt-Engel, Daniela Grossmann, Lizzy Gerischer, Maike Tech, Michael Schoppmeier, Martin Klingler, and Gregor Bucher
-
Tissue-specific transcriptomics, chromosomal localization, and phylogeny of chemosensory and odorant binding proteins from the red flour beetle Tribolium castaneum reveal subgroup specificities for olfaction or more general functions. BMC Genomics, 15(1):1141, 2014
Stefan Dippel, Georg Oberhofer, Jörg Kahnt, Lizzy Gerischer, Lennart Opitz, Joachim Schachtner, Mario Stanke, Stefan Schütz, Ernst A Wimmer, and Sergio Angeli
-
Large scale RNAi screen in Tribolium reveals novel target genes for pest control and the proteasome as prime target. BMC Genomics, 16(1):674, 2015
Julia Ulrich, Upalparna Majumdar, Christian Schmitt-Engel, Jonas Schwirz, Dorothea Schultheis, Nadi Ströhlein, Nicole Troelenberg, Daniela Grossmann, Tobias Richter, Jürgen Dönitz, et al.
-
Simultaneous gene finding in multiple genomes. Bioinformatics, page btw494, 2016
Stefanie König, Lars Romoth, Lizzy Gerischer, and Mario Stanke