Umfassende Modellierung von Redebeiträgen in Prosatexten

Antragsteller Professor Dr. Sebastian Padó

Fachliche Zuordnung Allgemeine und Vergleichende Sprachwissenschaft, Experimentelle Linguistik, Typologie, Außereuropäische Sprachen
Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing

Förderung Förderung von 2017 bis 2022

Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 350397899

Erstellungsjahr 2022

Zusammenfassung der Projektergebnisse

In sum, the project has made contributions on a number of levels. The first level is advances in the specific modelling of quotation detection, namely robust models (study 1) and rich datasets (study 2). The second level is in line with the development in computational linguistics to unify previously distinct tasks and concerns a better understanding of quotation detection as a more general information extraction task involving span detection (study 4) and slot filling (study 3). The third level is formed by the application of these ideas for an application in digital humanities (study 5). The uptake of the models and software that we developed by other research projects demonstrates that quotation detection, as we defined it, can now be carried out at a reasonable level of accuracy and robustness. The next frontier consists in integrating such "local" quotation information into a "global" understanding of a complete conversation, or beyond the conversation, into relations between actors. Some of these questions are being addressed in our currently ongoing project, MARDY (Modeling argumentation dynamics), which aims at building discourse networks (linking political actors and the claims that they make) from newspaper reports. However, as we note above, newspaper reports are considerably more formulaic in their use of reported speech, and the integration of quotation detection into more global text understanding of literary texts is, to our knowledge, still an open problem. At the interpersonal level, one of the outcomes of the project that the establishment of personal contacts between the Theoretical Computational Linguistics group at IMS Stuttgart and the chair of Romance Philology (Literary Studies), Hanno Ehrlicher, at Tübingen University, due to a shared interest in the analysis of quotations in historical corpora.

Projektbezogene Publikationen (Auswahl)

DERE: A task and domain-independent slot filling framework for declarative relation extraction. Proceedings of EMNLP. Brussels, Belgium, 2018
Adel, Heike; Bostan, Laura Ana Maria; Papay, Sean; Padó, Sebastian & Klinger, Roman
Quotation Detection and Classification with a Corpus-Agnostic Model. Proceedings of RANLP. Varna, Bulgaria, 2019
Papay, Sean & Padó, Sebastian
Dissecting Span Identification Tasks with Performance Prediction. Proceedings of EMNLP, pages 4881–4895
Papay, Sean; Klinger, Roman & Padó, Sebastian
RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text. Proceedings of LREC, pages 835-841
Sean Papay and Sebastian Padó

Servicenavigation

Hauptnavigation

Umfassende Modellierung von Redebeiträgen in Prosatexten

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Servicenavigation

Hauptnavigation

Umfassende Modellierung von Redebeiträgen in Prosatexten

Zusammenfassung der Projektergebnisse

Projektbezogene Publikationen (Auswahl)

Zusatzinformationen

Textvergrößerung und Kontrastanpassung