Speech representation - A literary and linguistic corpus study

Applicants Dr. Annelen Brunner; Professor Dr. Stefan Engelberg; Professor Dr. Fotis Jannidis

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
German Literary and Cultural Studies (Modern German Literature)

Term from 2016 to 2021

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 322751860

Goals of the project are developing automatic methods for the recognition of speech representation and, on the basis of their results, conducting literary and linguistic analyses of speech representation patterns and usage. The project will show how methods of Digital Humanities lead to a stronger convergence of literary and linguistic studies, without abandoning research interests specific to the disciplines. Methodological focus lies on developing (semi-)automatic computational and corpus linguistic strategies which are applied to a corpus of novels, newspapers and periodicals (temporal focus: 1841- 1918). These texts are already available in digital format and only need to be adapted for the corpus. The project can build upon an existing prototype for recognizing speech representation that uses rule-based methods and machine learning and is implemented in an established programming framework (UIMA). Based on a manual annotation of part of the corpus, this prototype will be improved and enhanced. Upon completion of the project the recognizer as well as the manually annotated corpus texts will be made available to the scientific community. Thus, the project contributes to the development of NLP tools for the automatic annotation of large corpora. The recognizer will generate quantitative data that allows for the first time a narratological and linguistic study of speech representation on a broad empirical basis. This data gives us a chance to approach several open research questions from the fields of linguistics and literary studies. Apart from the historical dimension, the influence of different text types (newspaper texts - fiction - magazines) and the distinction between 'high' and 'popular' literature are of particular interest. Key theoretical research questions are: 1) The diachronic and text-type dependent development of the four types of speech representation - direct, free indirect, indirect and reported representation; 2) the diachronic and text-type dependent development of inquit formula from a lexical and structural perspective; 3) speech representation verbs as an example for mechanisms of linguistic change. By working on those questions, the project contributes to theory construction in narratology, text type/genre studies and studies of lexical argument structures.

DFG Programme Research Grants

Servicenavigation

Hauptnavigation

Speech representation - A literary and linguistic corpus study

Additional Information

Servicenavigation

Hauptnavigation

Speech representation - A literary and linguistic corpus study

Additional Information

Textvergrößerung und Kontrastanpassung