Project Details
Investigation of noise generation in the three-dimensional time-varying vocal tract
Applicant
Professor Dr. Peter Birkholz
Subject Area
Applied Linguistics, Computational Linguistics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2025
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 572918001
One of the main problems with physics-based speech synthesis is the generation of realistic frication noise, which occurs when air flows through a narrow constriction in the vocal tract and becomes turbulent. While the governing equations of aeroacoustic noise generation are well known, their three-dimensional numerical simulation is too slow for many practical applications. Therefore, most noise source models predict the acoustic noise characteristics based on a few relevant one-dimensional aerodynamic and articulatory quantities like the volume velocity through the constriction, the pressure drop across the constriction, and its cross-sectional area. These models usually assume an instantaneous effect of the aerodynamic and articulatory quantities on the noise characteristics. However, some studies indicate that this assumption no longer holds when the articulatory boundary conditions change rapidly, e.g. when phonation modulates the airflow in voiced fricatives, or immediately after the closure release of plosives. The proposed project aims to develop a dynamic noise source model that takes into account the effects of rapidly changing boundary conditions on the generated frication noise. In addition, the effects of other relevant factors on the generated noise will be investigated, e.g. the cross-sectional shape of the critical constriction and the soft vocal tract walls. To investigate these dependencies, we construct a mechatronic vocal tract model and use it to generate a large number of vowel-consonant-vowel utterances. This allows us to relate the generated frication noise during the consonants to the time-varying articulatory and aerodynamic conditions. The new noise source model will be implemented and perceptually evaluated as part of the articulatory speech synthesizer VocalTractLab.
DFG Programme
Research Grants
