Project Details
A minimal infrastructure for the sustainable provision of extensible multi-layer annotation software for linguistic corpora
Subject Area
General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Security and Dependability, Operating-, Communication- and Distributed Systems
Software Engineering and Programming Languages
Security and Dependability, Operating-, Communication- and Distributed Systems
Software Engineering and Programming Languages
Term
from 2018 to 2022
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 391160252
The project's goal is the design, implementation, evaluation and documentation of a minimal infrastructure for the sustainable provision of research software. By hypthesis, an infrastructure can only be operated sustainably in an academic context if the technical and human resources that have to be provided by the respective academic institution can be minimalized for the long-term from the onset. In a case study providing a multi-layer annotation software for linguistic corpora, the project exemplifies that for such an infrastructure, only four components are strictly necessary: a source code repository platform; a repository providing versions of the software for end users; a repository providing the software's dependencies (e.g., software libraries); a maintainer who administers and publishes the infrastructure and research software, and manages the user and developer communities. Of these components, only the maintainer arguably needs to be funded by the respective academic institution. For all other components, potentially sustainable external infrastructure is available free of charge. A further requirement for the sustainable operation of the infrastructure developed in the project is the technical sustainability of the research software it provides. In the course of the project, the prototype "GraphAnno" will be developed into a stable product. The product, "Hexatomic", has a strong use potential across different linguistic disciplines by satisfying a verifiable high demand on the part of these scientific communities. Additionally, "Hexatomic" will satisfy the requirement for technical sustainability early on in the project, by implementing best practices of software engineering. These include, e.g., reproducible builds through an automated build system; comprehensive documentation of all aspects of the software; a permissive open source license; portability and runability on different operating systems; comprehensive test suites; public provision of the source code; extensibility and adaptability through modularization and a generic data model; extensive compatibility with other tools and data standards; well-structured community processes. The project evaluates and documents not only the satisfaction of the software's use potential, but also its potential for long-term, project-independent development. This is partly achieved through the acquisition of external contributions of functional modules to "Hexatomic". Moreover, the minimal infrastructure model is not only implemented, its implementation is also documented and tested. The test results are, in turn, documented and condensed into best practices, which represent an important project goal in and of themselves. In combining a hypothesis-driven approach with a case study, the project makes an important contribution towards the evaluation of minimal requirements for sustainable infrastructures for research software.
DFG Programme
Research data and software (Scientific Library Services and Information Systems)