GRK 1906: Informatische Methoden für die Analyse von Genomdiversität und -dynamik

Fachliche Zuordnung Grundlagen der Biologie und Medizin
Informatik
Mathematik

Förderung Förderung von 2013 bis 2018

Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 221270173

Erstellungsjahr 2020

Zusammenfassung der Projektergebnisse

Enabled by modern high-throughput analytic biotechnologies, genomic research has moved from studying single genomes to the concurrent analysis of multiple genomes. In this International Research Training Group, we have developed new computational approaches targeting both (i) genome diversity, i.e., the variation between diﬀerent samples, species, strains, individuals, cells, etc., and (ii) genomic dynamics originating from random mutations, recombination, evolutionary pressure and selection. Therefore we subdivided our research program into diﬀerent areas addressing diverse methodological needs. In the context of Area 1 “Scale-up call: Enhancing computational capacity”, the method of choice has been to develop new tools within modern distributed IT environments. This way, high-performance computing becomes aﬀordable and the algorithms are available close to the data. Within the IRTG, diﬀerent approaches for scale up have been pursued. Containerisation of application (e.g. via Docker) lead to easy deployment in distributed computing infrastructures, integration into workﬂow systems and reproducible analyses. Integration of existing tools and “dockerized” applications into the MapReduce streaming framework allow robust distribution in cloud environments. For other application, algorithms have been natively implemented in the MapReduce framework. These approaches have been successfully shown to apply metagenomics workﬂows and publish reproducible results, to scale metagenomics analyses as well as comparative genome analyses. Research in Area 2 “Data management: Basic storage and retrieval” has focused on novel data structures that allow to eﬃciently store the sequences along with high-level meta-data. In particular, data structures for indexing and compressing pangenomes together with algorithms for their functional analysis have been developed. Furthermore, a data warehouse-driven online tool for metadata based studies of metagenomes has been developed. For the development of new algorithms and methods (Areas 3–5), diﬀerent ﬁelds of application were addressed. Most notably, researchers of the IRTG developed algorithms for the computational determination of functional RNAs, for the eﬃcient grouping and clustering of NGS data, for reconstructing ancestral genomes including ancient DNA, for the simulation of the mutation process along the ancestral line of populations under selection, for the prediction and visualization of 3D protein-protein networks to identify and analyse drug-drug interactions, for microﬂuidics time lapse image analysis and visualization, and for the visualization of molecular dynamics and co-location in MSI and polyomics data. The methodologies used reach from the design of models, algorithms and data structures to machine learning.

Projektbezogene Publikationen (Auswahl)

Mycoplasma salivarium as a dominant coloniser of Fanconi anaemia associated oral carcinoma. PLoS One, 9(3), e92297, 2014
Henrich, Birgit; Rumming, Madis; Sczyrba, Alexander; Velleuer, Eunike; Dietrich, Ralf; Gerlach, Wolfgang; Gombert, Michael; Rahn, Sebastian; Stoye, Jens; Borkhardt, Arndt & Fischer, Ute
Scaﬀolding of ancient contigs and ancestral reconstruction in a phylogenetic framework. IEEE-ACM Trans. Comput. Biol. Bioinform. 15(6), 2094–2100, 2018
Luhmann, Nina; Chauve, Cedric; Stoye, Jens & Wittler, Roland
Scaﬀolding of ancient contigs and ancestral reconstruction in a phylogenetic framework. In: Proc. of BSB 2014, 135–143, 2014
Luhmann, Nina; Chauve, Cedric; Stoye, Jens & Wittler, Roland
Scaﬀolding of ancient contigs and ancestral reconstruction in a phylogenetic framework. In: Proc. of BSB 2014, 135–143, 2014
Luhmann, Nina; Chauve, Cedric; Stoye, Jens & Wittler, Roland
Automatic discovery of metagenomic structure. In: Proc. of IJCNN 2015. 2015
Lux, Markus; Sczyrba, Alexander & Hammer, Barbara
Bloom Filter Trie – a data structure for pan-genome storage. In: Proc. of WABI 2015, 217–230, 2015
Holley, Guillaume; Wittler, Roland & Stoye, Jens
CellWhere: graphical display of interaction networks organized on subcellular localizations. Nucleic Acids Res. 43(W1), W571–W575, 2015
Zhu, Lu; Malatras, Apostolos; Thorley, Matthew; Aghoghogbe, Idonnya; Mer, Arvind; Duguez, Stéphanie; Butler-Browne, Gillian; Voit, Thomas & Duddy, William
acdc – automated contamination detection and conﬁdence estimation for single-cell genome data. BMC Bioinformatics, 17. 2016
Lux, Markus; Krüger, Jan; Rinke, Christian; Maus, Irena; Schlüter, Andreas; Woyke, Tanja; Sczyrba, Alexander & Hammer, Barbara
Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage. Algorithms Mol. Biol. 11. 2016
Holley, Guillaume; Wittler, Roland & Stoye, Jens
Identiﬁcation and genome reconstruction of abundant distinct taxa in microbiomes from one thermophilic and three mesophilic production-scale biogas plants. Biotechnol. Biofuels, 9. 2016
Stolze, Yvonne; Bremges, Andreas; Rumming, Madis; Henke, Christian; Maus, Irena; Pühler, Alfred; Sczyrba, Alexander & Schlüter, Andreas
Omics Fusion – a platform for integrative analysis of omics data. J. Integr. Bioinform. 13(4), 296, 2016
Brink, Benedikt G.; Seidel, Annica; Kleinbölting, Nils; Nattkemper, Tim W. & Albaum, Stefan P.
The SCJ small parsimony problem for weighted gene adjacencies. In: Proc. of ISBRA 2016, 200–210, 2016
Luhmann, Nina; Thévenin, Annelyse; Ouangraoua, Aïda; Wittler, Roland & Chauve, Cedric
The SCJ small parsimony problem for weighted gene adjacencies. In: Proc. of ISBRA 2016, 200–210, 2016
Luhmann, Nina; Thévenin, Annelyse; Ouangraoua, Aïda; Wittler, Roland & Chauve, Cedric
A review of bioinformatics platforms for comparative genomics. Recent developments of the EDGAR 2.0 platform and its utility for taxonomic and phylogenetic studies. J. Biotechnol. 261, 2–9, 2017
Yu, J.; Blom, J.; Glaeser, S.P.; Jaenicke, S.; Juhre, T.; Rupp, O.; Schwengers, O.; Spänig, S. & Goesmann, A.
Analyzing large scale genomic data on the cloud with Sparkhit. Bioinformatics, 34(9), 1457–1465, 2018
Huang, Liren; Krüger, Jan & Sczyrba, Alexander
Bayesian collective Markov random ﬁelds for subcellular localization prediction of human proteins. In: Proc. of ACM BCB 2017, 321–329, 2017
Zhu, Lu & Ester, Martin
Bayesian collective Markov random ﬁelds for subcellular localization prediction of human proteins. In: Proc. of ACM BCB 2017, 321–329, 2017
Zhu, Lu & Ester, Martin
Comparative methods for reconstructing ancient genome organization. In: Comparative Genomics, 343–362. Springer, 2018
Anselmetti, Yoann; Luhmann, Nina; Bérard, Sèverine; Tannier, Eric & Chauve, Cedric
Comparative methods for reconstructing ancient genome organization. In: Comparative Genomics, 343–362. Springer, 2018
Anselmetti, Yoann; Luhmann, Nina; Bérard, Sèverine; Tannier, Eric & Chauve, Cedric
Comparative scaﬀolding and gap ﬁlling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes. Microbial Genomics, 3(9). 2017
Luhmann, Nina; Doerr, Daniel & Chauve, Cedric
Comparative scaﬀolding and gap ﬁlling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes. Microbial Genomics, 3(9). 2017
Luhmann, Nina; Doerr, Daniel & Chauve, Cedric
Dynamic alignment-free and reference-free read compression. In: Proc. of RECOMB 2017. LNCS, 50–65, 2017
Holley, Guillaume; Wittler, Roland; Stoye, Jens & Hach, Faraz
Dynamic alignment-free and reference-free read compression. In: Proc. of RECOMB 2017. LNCS, 50–65, 2017
Holley, Guillaume; Wittler, Roland; Stoye, Jens & Hach, Faraz
Feature relevance bounds for linear classiﬁcation. In: Proc. of ESANN 2017, Special Session on Biomedical data analysis in translational research: integration of expert knowledge and interpretable models. 2017
C. Göpfert, L. Pfannschmidt, and B. Hammer
Methods for the identiﬁcation of common RNA motifs. Universität Bielefeld. PhD thesis. 2017, 140
B. Löwes
Pan-genome storage and analysis techniques. In: Comparative Genomics, 29–53. Springer, 2018
Zekic, Tina; Holley, Guillaume & Stoye, Jens
Phylogenetic assembly of paleogenomes integrating ancient DNA data. Universität Bielefeld. PhD thesis. 2017
N. Luhmann
Rapid protein alignment in the cloud: HAMOND combines fast DIAMOND alignments with Hadoop parallelism. J. Biotechnol. 257, 58–60, 2017
Yu, Jia; Blom, Jochen; Sczyrba, Alexander & Goesmann, Alexander
ViCAR: an adaptive and landmark-free registration of time lapse image data from microﬂuidics experiments. Front. Genetics, 8, 69, 2017
Hattab, Georges; Schlüter, Jan-Philip; Becker, Anke & Nattkemper, Tim W.
A novel methodology for characterizing cell subpopulations in automated time-lapse microscopy. Front. Bioeng. Biotechnol. 6, 17, 2018
Hattab, Georges; Wiesmann, Veit; Becker, Anke; Munzner, Tamara & Nattkemper, Tim W.
Analyzing colony dynamics and visualizing cell diversity in spatiotemporal experiments. Universität Bielefeld. PhD thesis. 2018
G. Hattab
Context-speciﬁc subcellular localization prediction: Leveraging protein interaction networks and scientiﬁc texts. Universität Bielefeld. PhD thesis. 2018
L. Zhu
ddPCRclust: an R package and Shiny app for automated analysis of multiplexed ddPCR data. Bioinformatics, 34(15), 2687–2689, 2018
Brink, Benedikt G.; Meskas, Justin & Brinkman, Ryan R.
ddPCRclust: an R package and Shiny app for automated analysis of multiplexed ddPCR data. Bioinformatics, 34(15), 2687–2689, 2018
Brink, Benedikt G.; Meskas, Justin & Brinkman, Ryan R.
Dynamic alignment-free and reference-free read compression. J. Comp. Biol. 25(7), 825–836, 2018
Holley, Guillaume; Wittler, Roland; Stoye, Jens & Hach, Faraz
Dynamic alignment-free and reference-free read compression. J. Comp. Biol. 25(7), 825–836, 2018
Holley, Guillaume; Wittler, Roland; Stoye, Jens & Hach, Faraz
Eﬃcient grouping methods for the annotation and sorting of single cells. Universität Bielefeld. PhD thesis. 2018
M. Lux
GeFaST: An improved method for OTU assignment by generalising Swarm’s fastidious clustering approach. BMC Bioinformatics, 19(1), 321, 2018
Müller, Robert & Nebel, Markus E.
GenCoNet–a graph database for the analysis of comorbidities by gene networks. J. Integr. Bioinform. 15(4). 2018
Shoshi, Alban; Hofestädt, Ralf; Zolotareva, Olga; Friedrichs, Marcel; Maier, Alex; Ivanisenko, Vladimir A.; Dosenko, Victor E. & Bragina, Elena Yu
Interpretation of linear classiﬁers by means of feature relevance bounds. Neurocomputing, 298, 69–79, 2018
Göpfert, Christina; Pfannschmidt, Lukas; Göpfert, Jan Philip & Hammer, Barbara
Metadata-driven computational (meta)genomics. A practical machine learning approach. Universität Bielefeld. PhD thesis. 2018
M. Rumming
Molecular relationships between bronchial asthma and hypertension as comorbid diseases. J. Integr. Bioinform. 15(4). 2018
Bragina, Elena Yu.; Goncharova, Irina A.; Garaeva, Anna F.; Nemerov, Evgeniy V.; Babovskaya, Anastasija A.; Karpov, Andrey B.; Semenova, Yulia V.; Zhalsanova, Irina Z.; Gomboeva, Densema E.; Saik, Olga V.; Zolotareva, Olga I.; Ivanisenko, Vladimir A.; Dosenko, Victor E.; Hofestaedt, Ralf & Freidin, Maxim B.
Novel candidate genes important for asthma and hypertension comorbidity revealed from associative gene networks. BMC Med. Genomics, 11(1), 15, 2018
Saik, Olga V.; Demenkov, Pavel S.; Ivanisenko, Timofey V.; Bragina, Elena Yu; Freidin, Maxim B.; Goncharova, Irina A.; Dosenko, Victor E.; Zolotareva, Olga I.; Hofestaedt, Ralf; Lavrik, Inna N.; Rogaev, Evgeny I. & Ivanisenko, Vladimir A.
Omics visualization and its application to presymptomatic diagnosis of oral cancer. Universität Bielefeld. PhD thesis. 2018
B. Brink
Pan-genome search and storage. Universität Bielefeld. PhD thesis. 2018
G. Holley
Scaﬀolding of ancient contigs and ancestral reconstruction in a phylogenetic framework. IEEE-ACM Trans. Comput. Biol. Bioinform. 15(6), 2094–2100, 2018
Luhmann, Nina; Chauve, Cedric; Stoye, Jens & Wittler, Roland
Search for new candidate genes involved in the comorbidity of asthma and hypertension based on automatic analysis of scientiﬁc literature. J. Integr. Bioinform. 15(4). 2018
O. V. Saik, P. S. Demenkov, T. V. Ivanisenko, E. Y. Bragina, M. B. Freidin, V. E. Dosenko, O. I. Zolotareva, E. L. Choynzonov, R. Hofestaedt, and V. A. Ivanisenko
SeeVis-3D space-time cube rendering for visualization of microﬂuidics image data. Bioinformatics, 35(10), 1802–1804, 2019
Hattab, Georges & Nattkemper, Tim W.
ﬂowLearn: fast and precise identiﬁcation and quality checking of cell populations in ﬂow cytometry. Bioinformatics, 34(13), 2245–2253, 2018
Lux, Markus; Brinkman, Ryan Remy; Chauve, Cedric; Laing, Adam; Lorenc, Anna; Abeler-Dörner, Lucie & Hammer, Barbara
ﬂowLearn: fast and precise identiﬁcation and quality checking of cell populations in ﬂow cytometry. Bioinformatics, 34(13), 2245–2253, 2018
Lux, Markus; Brinkman, Ryan Remy; Chauve, Cedric; Laing, Adam; Lorenc, Anna; Abeler-Dörner, Lucie & Hammer, Barbara
A survey of gene prioritization tools for Mendelian and complex human diseases. J. Integr. Bioinform. 16(4). 2019
Zolotareva, Olga & Kleine, Maren
Cloud-based bioinformatics framework for next-generation sequencing data. Universität Bielefeld. PhD thesis. 2019
L. Huang
Comorbidity of asthma and hypertension may be mediated by shared genetic dysregulation and drug side eﬀects. Scientiﬁc Reports, 9(1), 1–11, 2019
Zolotareva, Olga; Saik, Olga V.; Königs, Cassandra; Bragina, Elena Yu.; Goncharova, Irina A.; Freidin, Maxim B.; Dosenko, Victor E.; Ivanisenko, Vladimir A. & Hofestädt, Ralf
Detection and visualization of communities in mass spectrometry imaging data. BMC Bioinformatics, 20(1), 303, 2019
Wüllems, Karsten; Kölling, Jan; Bednarz, Hanna; Niehaus, Karsten; Hans, Volkmar H. & Nattkemper, Tim W.
Feature relevance bounds for ordinal regression. In: Proc. of ESANN 2019. 2019
L. Pfannschmidt, J. Jakob, M. Biehl, P. Tino, and B. Hammer
FRI–Feature relevance intervals for interpretable and interactive data exploration. In: Proc. of CIBCB 2019, 1–10, 2019
Pfannschmidt, Lukas; Gopfert, Christina; Neumann, Ursula; Heider, Dominik & Hammer, Barbara
HyAsP, a greedy tool for plasmids identiﬁcation. Bioinformatics, 35(21), 4436–4439, 2019
Müller, Robert & Chauve, Cedric
HyAsP, a greedy tool for plasmids identiﬁcation. Bioinformatics, 35(21), 4436–4439, 2019
Müller, Robert & Chauve, Cedric
Identiﬁcation of the genetic factors underlying comorbidity between bronchial asthma and hypertension. Eu. J. Hum. Genet. 27(Suppl. 1), 1035–1036, 2019
E. Bragina, M. Freidin, O. Saik, O. Zolotareva, I. Goncharova, V. Ivanisenko, V. Dosenko, and R. Hofestädt
The SCJ small parsimony problem for weighted gene adjacencies. IEEE-ACM Trans. Comput. Biol. Bioinform. 16. 2019. Epub 2017
Luhmann, Nina; Lafond, Manuel; Thevenin, Annelyse; Ouangraoua, Aida; Wittler, Roland & Chauve, Cedric
The SCJ small parsimony problem for weighted gene adjacencies. IEEE-ACM Trans. Comput. Biol. Bioinform. 16. 2019. Epub 2017
Luhmann, Nina; Lafond, Manuel; Thevenin, Annelyse; Ouangraoua, Aida; Wittler, Roland & Chauve, Cedric
Tissue-speciﬁc subcellular localization prediction using multi-label Markov random ﬁelds. IEEE-ACM Trans. Comput. Biol. Bioinform. 16(5), 1471– 1482, 2019
Zhu, Lu; Hofestadt, Ralf & Ester, Martin
Tissue-speciﬁc subcellular localization prediction using multi-label Markov random ﬁelds. IEEE-ACM Trans. Comput. Biol. Bioinform. 16(5), 1471– 1482, 2019
Zhu, Lu; Hofestadt, Ralf & Ester, Martin

Servicenavigation

Hauptnavigation

GRK 1906: Informatische Methoden für die Analyse von Genomdiversität und -dynamik

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Servicenavigation

Hauptnavigation

GRK 1906: Informatische Methoden für die Analyse von Genomdiversität und -dynamik

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Textvergrößerung und Kontrastanpassung