Project Details
Projekt Print View

A comparative approarch to genome annotation in Tribolium

Subject Area General Genetics and Functional Genome Biology
Term from 2013 to 2016
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 102336348
 
Final Report Year 2017

Final Report Abstract

This project’s aims and also its achievements can be separated in two parts. Firstly, we provided bioinformatics support that was necessary for the other applied projects of the iBeetle research group to be successful and effective. Secondly, we developed a new type of bioinformatics tool – comparative AUGUSTUS – that can be generally applied to annotate one or more eukaryotic genomes when closely related genomes – annotated or not – are available. Comparative AUGUSTUS was in particular applied to four Tribolium species to assess remaining errors in the annotation of T. castaneum. Bioinformatics support of the screen. The RNA interference screen required the construction of a library of double stranded RNAs from genes that were then be knocked down. In this project, we identified and selected the genes for the screen. As the state-of-the-art in eukaryotic genome annotation is such that a non-negligible number of errors in the annotation were to be expected, such as the prediction of false exon boundaries, the prediction of false positive exons or the false splitting or joining of genes, we simultaneously improved the previous annotation. Further, as often for some parts of a gene its exon-intron structure is well-supported with evidence while other parts may be uncertain, we selected substrings of the mRNA that were sufficiently long, but also very likely to be correct. We improved the previous annotation, called official gene set (OGS2), to a new annotation on the basis of extensive RNA sequencing data. When in doubt about new alternatives we gave precedence to OGS2 genes in order to keep the annotation – that had been used by many – stable unless clearly indicated. The new annotation of T. castaneum contains 1452 new genes that do not overlap any gene in OGS2 and numerous corrections to previously existing gene models. This annotation is the first to include alternative splicing and untranslated regions and is in significantly better agreement with experimental data from next generation transcriptome sequencing that had not been available at the time when the previous annotation was performed. The new gene models were not only a basis for the screen in general but also for three other publications from the research group. Development of new methods for comparative genome annotation. Even although the speedup of genome sequencing has led to situations in which frequently a whole clade of related genomes needs to be annotated, such as the case with Tribolium castaneum, Tribolium freemani, Tribolium madens and Tribolium confusum, almost all annotation methods are such that they annotate one genome at a time. It is true that comparative methods had been developed that exploit the alignment with one or several other genomes while annotating a target genome, the methods that were capable of considering more than two input genomes would yet still only consider the gene structures in one of them. Applications to larger clades do not scale well with growing clade size and their practical dissemination was limited despite demonstrated good accuracy. Within this project, components of the new comparative extension of the gene-finder AUGUSTUS have been developed. Among them a new Bayesian codon selection model and the program ESPOCA for the classification of sequence into coding versus noncoding from a multiple genome alignment. Comparative AUGUSTUS is annotating all genomes simultaneously and can consider evidence such as an existing annotation of one genome and RNA-Seq evidence for any subset of the genomes. When tested on a clade of 12 Drosophilas, the coding bases not found by AUGUSTUS could be reduced from ∼ 6.6% in the single-genome version to ∼ 3.2% in the new comparative version. Further, comparative AUGUSTUS is able to map the existing annotation of one genome to another related genome (e.g. between cow and mouse) significantly more accurately as the common used methods based on the alignment of protein sequences to a genome. Both the comparative version of AUGUSTUS and the new tool ESPOCA are ongoing developments and their final performance and pratical impact yet to be established.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung