Detailseite
Projekt Druckansicht

Umfassende Modellierung von Redebeiträgen in Prosatexten

Fachliche Zuordnung Allgemeine und Vergleichende Sprachwissenschaft, Experimentelle Linguistik, Typologie, Außereuropäische Sprachen
Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
Förderung Förderung von 2017 bis 2022
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 350397899
 
Erstellungsjahr 2022

Zusammenfassung der Projektergebnisse

In sum, the project has made contributions on a number of levels. The first level is advances in the specific modelling of quotation detection, namely robust models (study 1) and rich datasets (study 2). The second level is in line with the development in computational linguistics to unify previously distinct tasks and concerns a better understanding of quotation detection as a more general information extraction task involving span detection (study 4) and slot filling (study 3). The third level is formed by the application of these ideas for an application in digital humanities (study 5). The uptake of the models and software that we developed by other research projects demonstrates that quotation detection, as we defined it, can now be carried out at a reasonable level of accuracy and robustness. The next frontier consists in integrating such "local" quotation information into a "global" understanding of a complete conversation, or beyond the conversation, into relations between actors. Some of these questions are being addressed in our currently ongoing project, MARDY (Modeling argumentation dynamics), which aims at building discourse networks (linking political actors and the claims that they make) from newspaper reports. However, as we note above, newspaper reports are considerably more formulaic in their use of reported speech, and the integration of quotation detection into more global text understanding of literary texts is, to our knowledge, still an open problem. At the interpersonal level, one of the outcomes of the project that the establishment of personal contacts between the Theoretical Computational Linguistics group at IMS Stuttgart and the chair of Romance Philology (Literary Studies), Hanno Ehrlicher, at Tübingen University, due to a shared interest in the analysis of quotations in historical corpora.

Projektbezogene Publikationen (Auswahl)

  • DERE: A task and domain-independent slot filling framework for declarative relation extraction. Proceedings of EMNLP. Brussels, Belgium, 2018
    Heike Adel, Laura Ana Maria Bostan, Sean Papay, Sebastian Padó and Roman Klinger
    (Siehe online unter https://doi.org/10.18653/v1/D18-2008)
  • Quotation Detection and Classification with a Corpus-Agnostic Model. Proceedings of RANLP. Varna, Bulgaria, 2019
    Sean Papay and Sebastian Padó
    (Siehe online unter https://doi.org/10.26615/978-954-452-056-4_103)
  • Dissecting Span Identification Tasks with Performance Prediction. Proceedings of EMNLP, pages 4881–4895
    Sean Papay, Roman Klinger and Sebastian Padó
    (Siehe online unter https://doi.org/10.18653/v1/2020.emnlp-main.396)
  • RiQuA: A Corpus of Rich Quotation Annotation for English Literary Text. Proceedings of LREC, pages 835-841
    Sean Papay and Sebastian Padó
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung