Project Details
Projekt Print View

Accessing multimodal spoken language corpora: cross-linking and user-group specific differentiation

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Individual Linguistics, Historical Linguistics
Term from 2017 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 359880971
 
This project is a cooperation of three partners -- the Archive for Spoken German at the IDS Mannheim, the Hamburg Centre for Language Corpora and the University of Leipzig -- who have, during the last 15 years, made crucial contributions to the development of technology, best practices and standards in the area of oral corpora, and who have participated in various initiatives concerned with the curation of legacy data, the compilation of new corpora, and the establishment of archives and distribution platforms for such data. Oral corpora -- i.e. collections of audio or video recordings of spoken language with their transcriptions and annotations -- are the empirical basis for studying a great variety of research questions in linguistics (e.g. conversation analysis, sociolinguistics/dialectology, phonetics/phonology, corpus lexicography), in speech technology, and in other academic disciplines (such as qualitative social studies, oral history, education studies). Current access methods for these data are still geared towards requirements closely tied to the specific history and circumstances of the respective institutions. As a user study conducted in preparation of this proposal has revealed, there are many application scenarios which would profit from a closer integration of these methods, and from their differentiation according to specific user groups.The aim of the project is thus to integrate and cross-link access methods to the oral corpora which the partners archive and disseminate to the research community, and to differentiate them according to different user groups. On the one hand, a common technical base structure will be developed in order to enable researchers to compare or accumulate data from different sources. In that way, the existing solutions will be professionalised and coordinated on a unified basis. On the other hand, methods will be developed which optimise access to the resources according to the needs of specific user groups. The project will elaborate usage scenarios for two groups -- researchers working in foreign language teaching and researchers working in discourse and variation studies -- and implement them as interfaces for corpus exploration and query. These solutions will be transferrable (to other data centres or resources) and extensible (to further usage scenarios).
DFG Programme Research data and software (Scientific Library Services and Information Systems)
 
 

Additional Information

Textvergrößerung und Kontrastanpassung