Project Details
Digital Media Workflow II: Cooperative Data Infrastructures for AI and TDM
Term
since 2026
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 576318340
Building on the Digital Media Workflow (WDM) project, in the project ‘Digital Media Workflow II: Cooperative Data Infrastructures for AI and TDM’ (WDM-II) the SUB Göttingen and ULB Darmstadt are developing a technical system in which a community of infrastructure institutions will be able to work collaboratively to harvest and process texts and make them available to researchers for TDM (text and data mining) analyses in a uniformly structured XML format. This is a format for full text and metadata that was developed in the WDM project on the basis of the Text Encoding Initiative (TEI) and fulfils the needs of both researchers and infrastructure institutions. The project is aiming to establish a web-based provision system (‘WDM Hub’), into which all institutions will be able to upload the texts they have harvested and converted into TEI format, ideally including all associated image and supplementary information files. Researchers will then be able to retrieve the files via an API as well as a graphical user interface. While the WDM project was limited to open access literature, the focus will now be expanded to include licence-protected literature. To this end, it will be examined if and how a rights and role management system can be established which, depending on the publisher's consent, will enable texts to be made available only to authorised users of the respective institution. In order to realise this, agreements with publishers will be sought. The experience gained in the project will be incorporated into the creation of a publicly accessible guide. Parallel to the technical developments, the project will build a community of interested infrastructure organisations which will contribute to the collaborative harvesting of texts. In order to provide users with easy access to the content made available via the project, a labelling system will be established in the Electronic Journals Library (EZB) indicating that the respective journal is available in a format suitable for TDM. This project is intended as the first of two phases building on the developments of the WDM project. The aim is to consolidate the developed systems in a new project after the three years of funding, and to further intensify networking within the community.
DFG Programme
Research data and software (Scientific Library Services and Information Systems)
