QASciInf: Question Answering for Scientific Information
Final Report Abstract
The number of scientific articles is increasing exponentially, making it challenging for scientists to keep up with the latest research and benefit from all relevant work. Novel technologies must be developed to support researchers in accessing the rapidly expanding knowledge. Natural Language Processing (NLP), including Information Retrieval (IR), Question Answering (QA) systems, and generative methods, can play a pivotal role in addressing this challenge. Retrieval systems are used to find relevant information for downstream QA systems, but can also be used directly, e.g., in literature review. Given a question, QA systems can efficiently extract information from a publication. Table understanding is a core skill of any system when applied to scientific content. In a generative model, it can be utilized, e.g., to summarize the results presented in a table. However, at the submission time of the proposal, no resources to evaluate models on scientific tasks were available. Therefore, in the QASciInf project, we published datasets and benchmarks that introduced relevant tasks enabling the systematic evaluation of models. We introduced a zero-shot benchmark for retrieval to evaluate systems on domains they have not been trained on (e.g., in scientific domains where training data is scarce). Further, we proposed a context-aware retrieval benchmark to measure how well a system can take into account long contexts, such as scientific publications. We introduced a dialog dataset over publications, and we published a QA dataset over papers with expert questions and answers. Also, we introduced a table-to-text dataset over scientific tables and their descriptions. Together, these datasets and benchmarks allow a comprehensive evaluation of NLP methods in the scientific domain. Further, we developed methods to improve the performance on these tasks that support scientists. We introduced a few-shot information retrieval task and proposed a method where a system learns refined query representations from a few user demonstrations, which is helpful for literature review. In 2024, we evaluated baseline systems to take document context into account when representing a single passage, which improves retrieval in scientific QA scenarios. Finally, we introduced a pre-training method enhancing the numerical reasoning ability of Large Language Models (LLMs), subsequently improving table-to-text generation tasks. Finally, we evaluated QA systems in collaboration with the DFG-funded UKP-SQuARE project. We developed a public demonstrator, including various datastores and QA models, allowing users to compose and analyze methods (Baumgärtner et al., 2022; Sachdeva et al., 2022; Puerto et al., 2023).
Publications
-
"BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models." Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2021.
Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava & Iryna Gurevych
-
"SciGen: a dataset for reasoning-aware text generation from scientific tables." Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). 2021.
Nafise Moosavi, Andreas Rücklé, Dan Roth & Iryna Gurevych
-
Incorporating Relevance Feedback for Information-Seeking Retrieval using Few-Shot Document Re-Ranking. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 8988-9005. Association for Computational Linguistics.
Baumgärtner, Tim; Ribeiro, Leonardo F. R.; Reimers, Nils & Gurevych, Iryna
-
UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing: System Demonstrations, 28-38. Association for Computational Linguistics.
Sachdeva, Rachneet; Puerto, Haritz; Baumgärtner, Tim; Tariverdian, Sewin; Zhang, Hao; Wang, Kexin; Saadi, Hossain Shaikh; Ribeiro, Leonardo F. R. & Gurevych, Iryna
-
UKP-SQUARE: An Online Platform for Question Answering Research. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 9-22. Association for Computational Linguistics.
Baumgärtner, Tim; Wang, Kexin; Sachdeva, Rachneet; Geigle, Gregor; Eichler, Max; Poth, Clifton; Sterz, Hannah; Puerto, Haritz; Ribeiro, Leonardo F. R.; Pfeiffer, Jonas; Reimers, Nils; Şahin, Gözde & Gurevych, Iryna
-
A Dataset of Argumentative Dialogues on Scientific Papers. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 7684-7699. Association for Computational Linguistics.
Ruggeri, Federico; Mesgar, Mohsen & Gurevych, Iryna
-
UKP-SQuARE v3: A Platform for Multi-Agent QA Research. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), 569-580. Association for Computational Linguistics.
Puerto, Haritz; Baumgärtner, Tim; Sachdeva, Rachneet; Fang, Haishuo; Zhang, Hao; Tariverdian, Sewin; Wang, Kexin & Gurevych, Iryna
-
UKP-SQuARE: An Interactive Tool for Teaching Question Answering. Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), 195-204. Association for Computational Linguistics.
Fang, Haishuo; Puerto, Haritz & Gurevych, Iryna
-
DAPR: A Benchmark on Document-Aware Passage Retrieval. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 4313-4330. Association for Computational Linguistics.
Wang, Kexin; Reimers, Nils & Gurevych, Iryna
-
PeerQA: A Scientific Question Answering Dataset from Peer Reviews. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 508-544. Association for Computational Linguistics.
Baumgärtner, Tim; Briscoe, Ted & Gurevych, Iryna
