Project Details
Projekt Print View

GAIUS - Maintenance activities for the sustainability of AUGUSTUS

Subject Area Bioinformatics and Theoretical Biology
Term from 2018 to 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 391397397
 
AUGUSTUS is a tool for the structural annotation of genes in genomic sequences. Structural genome annotation is the classical bioinformatics task of finding genes and identifying their exon-intron structure in a genome. It is commonly aided by RNA-Seq and by homology data from related genomes. Comprehensive state of the art genome annotation also requires that a clade-specific statistical model with thousands of parameters is adjusted to the target genome. The task of gene prediction is carried out frequently and on most newly sequenced genomes. In the most recent independent assessment of genome annotation methods, AUGUSTUS belonged to the most accurate programs, for example, it achieved with 61% the highest gene-level sensitivity on human protein-coding genes. For years, we have been observing a rising number of citations as well as a very high number of downloads and web service submissions.The development of AUGUSTUS was performed in different research projects, e.g. on new methods for homology integration, for automatic training, for RNA-Seq integration, alternative splicing or multi-genome annotation. Currently, AUGUSTUS is available to users as open source code written in C++ and through two web services. However, the usability of AUGUSTUS and code quality were not yet addressed. The lack of usability makes it especially difficult for less IT literate biologists to use AUGUSTUS. Current usability deficiencies cost many users valuable time and push other users towards choosing a more convenient but a less suited choice, that may ultimately result in wrong conclusions or failed experiments at a later time. Furthermore, there is currently no system in place to share species-specific parameters across different research groups that study related species. The lack of focus on the code quality makes it difficult for other researchers to contribute. Moreover, since the source code development currently happens in internal repositories at the University of Greifswald and there is no issue management in place to handle change requests and bug reports, it is difficult for other researchers to get involved in the development of AUGUSTUS.We therefore propose to address the above mentioned issues in the GAIUS project. We will improve usability through better documentation, development of easier interfaces and a unified pipeline script. A repository for parameter and data sharing will allow users to benefit directly of other users' work on related genomes. This will also support replicable research. The local installation of AUGUSTUS will become simpler via a Debian package and pipeline virtualization. Source code and issue management will be addressed e.g. by using GitHub. The WebAUGUSTUS deployment infrastructure will be improved through the adoption of DevOps methods to facilitate future updates; and very importantly, the technical depth of source code will be reduced through additional tests and refactoring of existing code.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung