Computational Models of Misunderstanding for Complex Instructional Texts

Applicant Professor Dr. Michael Roth

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages

Term from 2018 to 2025

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 411013080

Project Description

The planned Emmy Noether group is concerned with the systematic analysis and computational modelling of text passages that can lead to misunderstandings. In this context, we define a "potential for misunderstanding" as an omission or semantic ambiguity, on the basis of which a reader of a text can arrive at an interpretation that appears plausible in itself, but that was not intended by the author. Consider, for example, the following sentence from a board game manual: "In each round a player performs one action, starting with the starting player."—Can only one player perform an action in one round, or can every player act in each round? And in what order do the players take their turns? The interpretation intended by the author remains unclear, since relevant aspects of meaning were expressed only ambiguously or not explicitly at all.The proposed junior research group will develop computational-linguistic resources for the automatic recognition of misleading text passages. These resources will make it possible to diagnose, for example, whether route directions or medical instructions are easily misunderstood. We have three objectives. (1) In order to enable a systematic analysis of misunderstandings in written language, we will first create large-scale corpora of text segments from which potential misunderstandings arise. These segments will give us insights regarding the semantic phenomena that most often lead to misunderstandings. We will create corpora semi-automatically by making pairwise comparisons between two versions of a manual and specifically searching for clarifications in the revised version, which are due to potential misunderstandings in the older version. (2) On the basis of the collected data, we will develop statistical methods that will automatically detect potential misunderstandings. In order to achieve this goal, we will develop various computational-linguistic approaches that model the relationship between omissions or semantic ambiguities and the potential for misunderstanding in a given context. We will use probabilistic and neural models, which will be adapted to the respective semantic phenomena. In addition to the linguistic context, we will also consider the visual context and, to a limited extent, relevant world knowledge. (3) Finally, we will test the practical benefits of the methods developed for natural language processing applications. Specifically, we will examine how predictions about the potential for misunderstanding can be used (a) to prevent machine translation systems from generating misleading target language output, (b) to make writing assistance tools draw a user’s attention to unclear wording, and (c) to detect unreliable predictions of semantic analysis methods.

DFG Programme Emmy Noether Independent Research Groups

Servicenavigation

Hauptnavigation

Computational Models of Misunderstanding for Complex Instructional Texts

Additional Information

Servicenavigation

Hauptnavigation

Computational Models of Misunderstanding for Complex Instructional Texts

Additional Information

Textvergrößerung und Kontrastanpassung