Automatisierte Charakterisierung mikrobieller Genomen und Metagenomen anhand der Sammlung und Verifizierung von Assoziationsregeln
Zusammenfassung der Projektergebnisse
In the last years, sequencing prokaryotic genomes became increasingly simple and cheap, thanks to the high quality sequencing technologies, such as the platforms offered by Illumina, Nanopore, Pacific Biosciences and other, as well as the powerful bioinformatics software available for the analysis of the results, which allows to easily assemble the sequencing reads and annotate the resulting assembly. For this reason, it is becoming increasingly normal to just sequence and assemble genomes of bacteria and archaea, without this to be followed by a thorough analysis of their contents. An annotation often includes some thousand genes and other features, and manual analyses are time consuming. Furthermore, they are increasingly less rewarding, since just the sequencing and annotation of a prokaryotic genome became too easy to represent a significant scientific advance, and does not by itself justify a publication in a highly ranked journal. This project had the goal to create a system, which can improve the situation, by automatically finding the most interesting parts of a prokaryotic genome, based on expectations about its contents, prior to the genome analysis. Although interest can be subjective, the analysis of scientific literature in the field of genomics shows that such expectations are often expressed in publications and can be collected, logically analysed and applied to new genomes. In this project, a logical framework and a software infrastructure were developed, which aim at providing the basis for collecting from literature and verifying genome content expectations. They were developed as a series of software packages, mainly for the programming language Python, as well as format specifications and ontologies, which, altogether, represent a significant step towards the formalization of this kind of analysis.
Projektbezogene Publikationen (Auswahl)
-
Fastsubtrees: simple and efficient subtrees extractions in Python with applications to NCBI taxonomy. Journal of Open Source Software, 7(79), 4755.
Modi, Aman & Gonnella, Giorgio
-
TextFormats: Simplifying the definition and parsing of text formats in bioinformatics. PLOS ONE, 17(5), e0268910.
Gonnella, Giorgio
-
Collection of prokaryotic genome contents expectation rules from scientific literature
Serena Lam & Giorgio Gonnella
-
Designing text representations for existing data using the TextFormats Specification Language
Giorgio Gonnella
-
EGC: a format for expressing prokaryotic genomes content expectations
Giorgio Gonnella
-
ProSt: computing, storing and visualizing attributes of prokaryotic genomes
Giorgio Gonnella
-
Unambiguosly expressing expectations about the content of prokaryotic genomes
Giorgio Gonnella
