Project Details
QASciInf: Question Answering for Scientific Information
Applicant
Professorin Dr. Iryna Gurevych
Subject Area
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2014
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 252295018
The number of published scientific articles has grown exponentially in the last few decades. This makes it hardly possible for researchers to find and benefit from all relevant works. In this project, we address this problem and propose a set of novel research techniques to perform question answering (QA) over scientific information. The unique challenges of the scientific domain require novel approaches that are, to date, unexplored in QA research. In particular, a QA system for scientific information needs to (a) consider information from heterogeneous sources, (b) include better context-aware methods to process the long context that is represented by scientific articles, and (c) reason over the content of tables to generate answers based on the data. To enable research in this direction, we construct two novel datasets for (1) hybrid question answering over the text of scientific articles, table data, and discussions on the web, and (2) generating informative table descriptions through reasoning over the table content. In contrast to existing QA datasets, scientific QA is not limited to question-answer pairs that address the text of articles. Some questions can only be answered by reasoning over scientific tables and some can be answered by using related discussions on the web. Thus, based on these datasets, we propose approaches that can select relevant content from discussions on the web, while incorporating rich contextual information from the scientific article and the retrieved discussions. In addition, we research novel text generation models that are capable of reasoning over complex scientific tables. Because reasoning-aware table-to-text generation requires a considerable amount of training data, we propose novel methods to train generalizable table-to-text models by automatically expanding the training data with weakly supervised and semi-supervised training techniques. Finally, we consolidate our models in a prototype for hybrid QA over scientific literature which we evaluate in a user study.
DFG Programme
Research Grants