Project Details
Projekt Print View

Chat4Data: Improving the Findability of Research Data in Earth System Sciences with Large Language Models

Subject Area Data Management, Data-Intensive Systems, Computer Science Methods in Business Informatics
Geodesy, Photogrammetry, Remote Sensing, Geoinformatics, Cartography
Methods in Artificial Intelligence and Machine Learning
Term since 2026
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 574305041
 
Findable - this is what the "F" of FAIR data management stands for. To enable findability, the FAIR principles focus on global identifiers as well as searchable and "rich" metadata for a data set. However, documenting data with "rich" metadata is time-consuming and searches have so far mostly been limited to keyword searches. In the proposed project, we want to use current methods of artificial intelligence to improve the finding of data sets and thus enable subsequent use. To this end, natural language processing (NLP) methods will be used to develop and test AI models that combine knowledge about a scientific discipline and data records from this discipline. For this project, the models are trained with metadata from research data repositories as well as discipline specific texts (publications, texts from data journals) in order to enable the model to better understand the content. Researchers should benefit from this by communicating with the AI model via a chatbot and being guided to the data sets they are looking for via questions and answers. The Earth System Sciences (ESS) in cooperation with the NFDI consortium NFDI4Earth serve as a concrete use case/pilot to be implemented. The project can build on preliminary work by the national competence center for big data and AI Scads.AI Dresden/Leipzig and the BMWK project OpenGPT-X.
DFG Programme Research data and software (Scientific Library Services and Information Systems)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung