Project Details
Projekt Print View

Reference Understanding in the Social Sciences (OUTCITE)

Subject Area Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Term from 2016 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 293069437
 
Final Report Year 2024

Final Report Abstract

The Outcite project is an initiative taken to improve the accessibility and linking of citation data, particularly in the social sciences. Extending the previous EXCITE project, which identified gaps in bibliographic databases, Outcite focuses on linking references that are not easily found in existing databases, such as incomplete citations, and web resources. The project developed tools to process and match these "non-source items" to their original sources, thereby enhancing the completeness of citation records available for research. The core objective of the project was to develop a scalable toolchain that could accurately link these non-source items to their corresponding sources. This involved several key processes: (i) Extracting the metadata and segmenting the references that appeared in academic full-text documents using various pre-existing state-of-the-art tools like Grobid, Cermine, and Anystyle. (ii) Matching and linking the references to the existing open-source bibliographic records such as SSOAR, GESIS search, DNB collection, sowiport, ArXiv, econbiz, crossref, and OpenAlex. (iii) Deduplication has been performed to reduce the redundancy and enhance the completeness of the references. (iv) The provisioning and the distribution of the outcomes by setting up a cron job to run the pipeline for SSOAR documents and the live demonstrator for public benefit has also been developed. As of the project’s completion, Outcite has processed over 73,000 PDF documents from the SSOAR repository, ingesting more than 3.4 million references into the GESIS Search database. About 1.74 million of these references have been successfully linked to their online sources. The citation data has been shared with the OpenCitations initiative for further processing. Furthermore, the project has been disseminated by publishing papers for the research outcomes. This has been presented in various workshops and conferences conducted and attended during the project tenure.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung