Project Details
Projekt Print View

NatLit – Natural language Literature research in the metadata of the German National Library based on Large Language Models

Subject Area Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Methods in Artificial Intelligence and Machine Learning
Term since 2025
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 565051990
 
The German National Library (Deutsche Nationalbibliothek, DNB), together with the L3S Research Center at the Leibniz Universität Hannover, is applying for a 24-month development, testing and evaluation of a natural language query interface for literature research in the metadata of the German National Library as part of the LIS funding program e-research technologies – application-oriented development and testing of e-research technologies. The DNB acts as the central institution for the collection, indexing and provision of publications in Germany and fulfills its legal mandate to perform seamless bibliographic indexing of all German or German-language publications that have been issued since 1913. The entire metadata inventory of the German National Bibliography is accessible free of charge via the catalog of the German National Library. This catalog does not only serve as an initial bibliographic index, but is also an important source for scientific literature research. Over 33 million different publications are currently listed in the DNB, in addition to tables of contents of 2.2 million printed publications and almost 3 million publications that are available in full text. A natural language query interface, which is implemented through the use of Large Language Models (LLMs), is intended to significantly reduce the amount of prior knowledge required to use this DNB metadata, which will significantly improve the accessibility and user-friendliness of the data interfaces. The aim of the NatLit project is therefore to find the publications indexed in the German National Library's metadata better and more accurately for research questions by interacting with LLMs. Users without any bibliographic knowledge or knowledge of the query language are supported in carrying out complex, precise and comprehensive search queries. The interactive literature search will take place in a natural language chat in two phases. In phase 1 (“literature search”), a user formulates a search query for publications in the DNB holdings. Using a Retrieval-Augmented Generation approach, relevant entities in the query are extracted and relevant metadata in the DNB catalog is identified via a subgraph search and made available to a LLM – in addition to library background knowledge – to answer the question. In phase 2 (“media summary”), users can ask questions about the publications found, which are then answered by an LLM. The quality of the results of this literature research with large language models should be comparable or better than with conventional methods, measured against the criteria of technical functionality, search and response quality, and user experience. NatLit as a pilot project for the use of large language models with the extensive database of a national library is intended to test a simpler and more efficient literature search.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung