Project Details
Projekt Print View

Experiments and models of speech recognition across tonal and non-tonal language systems (EMSATON)

Subject Area Acoustics
Electronic Semiconductors, Components and Circuits, Integrated Systems, Sensor Technology, Theoretical Electrical Engineering
Otolaryngology, Phoniatrics and Audiology
Term from 2019 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 415895050
 
Human speech communication is the basis of our culture. Even though the articulation organs and the ear are very similar across all humans, their usage across languages shows high variability in solving the task to effectively communicate not only in quiet, but also under challenging acoustical conditions and for hearing impairment. The current project will shed light on how this is achieved by exploring the acoustic, phonetic and audiological foundations of speech recognition of tonal and non-tonal languages and the ability of current speech recognition models to replicate possible differences in recognition across tonal and non-tonal languages. The main long-term goal of EMSATON is to quantitatively understand the reduction of human speech recognition in noise as being influenced by different talkers and speaking styles (i.e., Lombard speech), different language systems (i.e., tonal languages (Mandarin, Cantonese) vs. Western languages (German, English, Spanish)) and different impairment factors (i.e., type of noise, reverberation, individual hearing impairment).We will exploit and extend the closed-set multilingual Matrix sentence recognition test that can be used to assess speech recognition in a highly comparable way across languages (i.e., 20 languages including German, British and American English, Spanish, and recently Mandarin). We will develop the Matrix test in Cantonese to have a second tonal language as reference and will relate the new tonal language tests to non-tonal languages. We will also investigate the effect of talker by including (bilingual) talkers and the effect of speaking style (normal and Lombard speech with a high production effort). Both objective acoustic-phonetic analysis and speech recognition modelling will be performed to better understand the differences and the importance of different speech cues across different languages (tonal vs. non-tonal), across talkers, and speaking styles. In order to identify relevant factors for (speech-related) differences across very different languages and to evaluate a number of assumptions of existing models like SII, HASPI, STOI or the FADE model, the data across languages, speakers and speaking styles will be used to test the prediction accuracy of current models and to establish a benchmark set of data and predictions.This will provide us with the basis for a quantitative, model-based analysis of the language effect and several underlying factors across two tonal languages (Mandarin, Cantonese) and typical non-tonal languages (German, English, Spanish). A possible outcome might be guidelines for constructing assistive listening and hearing devices in a more language-type-specific way, thus optimizing the respective benefit for tonal and non-tonal language users.
DFG Programme Research Grants
International Connection China, China (Hong Kong)
Co-Investigator Dr. Anna Warzybok
 
 

Additional Information

Textvergrößerung und Kontrastanpassung