Continuous quality control for research data to ensure reproducibility: an institutional approach (CONQUAIRE)
Theoretical Computer Science
Final Report Abstract
The Conquaire project has analyzed in detail eight case studies in computational reproducibility involving research groups from areas as varied as computer science / robotics, psychology, linguistics, biology and chemistry. On the basis of accompanying the work of these groups over three years, it has developed a detailed understanding of the variety and heterogeneity of analytical research workflows involved. In each of these case studies, Conquaire has managed to independently reproduce a central result published in one of the papers of the groups involved in the case studies. As a result of the project, the scripts and data for the above mentioned use cases is available in a university-wide Git system. The main obstacles for analytical reproducibility found were i) the lack of documentation and thus reliance on guidance by the original authors, ii) the reliance on some manual steps in the analytical workflow (e.g. clicking on a GUI) , iii) the reliance on non-open and commercial software, and iv) lack of information about which particular version of software and/or data was used to generate a specific results. In terms of infrastructure, Conquaire has developed infrastructure on top of a Git system that allows researchers to commit their data early in the research process into a distributed versioning system, with the benefit of providing a backup service but most importantly versioning the data and making different versions of the data referenceable. The project has also implemented continuous integration principles on top of the Git system, allowing researchers to define tests that their data have to pass as a basis to ensure data quality. It has implemented a badge system that publishes the results of the tests via the Bielefeld University PUB system to create incentives for researchers to make their data consistent and ready to be reused by others. The use of social rewards was an interesting idea to explore, yet it remains to be seen if this sort of incentive-creating mechanisms is accepted by the community of researchers. Overall, the Conquaire project has provided proof-of-concept that analytical reproducibility is indeed feasible and can be effectively supported by an institutional approach and infrastructure that support for scientists to provide their code and data into an institutional repository if not a public repository as a first step to making artifacts referenceable and accessible in line with the FAIR principles.
Publications
-
Conquaire: Towards an architecture supporting continuous quality control to ensure reproducibility of research. (2017) D-Lib Magazine 23(1/2)
Ayer V, Pietsch C, Vompras J, Schirrwagen J, Wiljes C, Jahn N, Cimiano P
-
Enabling Git based research data quality control for institutional repositories. 9th Plenary Meting of the Research Data Alliance (RDA), Repository Platforms for Research Data IG
Ayer V, Pietsch C, Vompras J, Schirrwagen J, Wiljes C, Peil V, Cimiano P
-
Expanding the research data management service portfolio at Bielefeld University according to the three-pillar principle towards data FAIRness. Conference Abstract, Göttingen-CODATA RDM Symposium 2018, Göttingen
Schirrwagen J, Cimiano P, Ayer V, Pietsch C, Wiljes C, Vompras J, Pieper D
-
Conquaire: Coupling a local GitLab instance with an institutional repository for instant research data publications. Conference Abstract, Open Science Confererce, Berlin
Pietsch C, Schirrwagen J, Peil V, Ayer V, Cimiano P, Herrmann F, Rempel A, Vompras J, Wiljes C
-
Expanding the Research Data Management Service Portfolio at Bielefeld University According to the Three-pillar Principle Towards Data FAIRness. (2019) Data Science Journal 18(1): 6
Schirrwagen J, Cimiano P, Ayer V, Pietsch C, Wiljes C, Vompras J, Pieper D