Data Management, Integration and Analysis

Applicants Professor Dr. Johannes Betge; Dr. Maria Zimmermann

Subject Area Gastroenterology

Term since 2025

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 537604907

Project Description

Datasets in biomedical research have exponentially increased in size in recent years due to the emergence of large-scale technologies, such as next-generation sequencing, phenotypic screening or single cell sequencing. The analysis, storage and sharing of these datasets are complex, due to their size, a vast amount of data formats, analysis pipelines, legal restrictions, sharing options and databases. This leads to inefficiencies in research projects, problems in secondary use of data, and ultimately also contributes to the reproducibility problem in science. GenoMiCC projects will produce more than 12 types of large-scale data, and there is a high demand for data sharing options. This includes sensitive data and currently, different analysis pipelines and storage spaces are used. We aim to standardise data and metadata management, improve and facilitate data sharing within the consortium, align data analysis pipelines for common types of data, and provide support for data analysis and computing. To this end, we will provide a framework for all GenoMiCC projects to standardise metadata of all large-scale datasets to facilitate combined analyses. We will set up a database that will include information about all datasets obtained throughout the project and together with the Integrated Biobank Mannheim, we will provide a virtual biobank of biomaterials available in the consortium. A central part of the project will be the set-up of a central data sharing platform / data space for the consortium, building on existing data storage and computing capacity of Heidelberg University and in cooperation with the Heidelberg-Mannheim Life Science Alliance Data Space project. Based on this, we will launch a data analysis centre, which will work together with all projects to unify, automate and design common analysis pipelines within the consortium and provide support for computing within the consortium. This will also allow meta-analyses of the consortium datasets and bringing these results together with public data for larger context. Together, CP2 will advance scientific discoveries and sustainability within GenoMiCC by facilitating standardised data management, data sharing, aligning analysis pipelines, and providing support for analysis and computing.

DFG Programme Research Units

Subproject of FOR 5806: Functional Genomics and Microbiomics in Precision Medicine of Colorectal Cancer (GenoMiCC)

Servicenavigation

Hauptnavigation

Data Management, Integration and Analysis

Additional Information

Servicenavigation

Hauptnavigation

Data Management, Integration and Analysis

Additional Information

Textvergrößerung und Kontrastanpassung