Project Details
Projekt Print View

Wideband acoustic modeling of speech

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2019 to 2023
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 418848246
 
Final Report Year 2022

Final Report Abstract

In this project, an e cient simulation framework for 3D vocal tract acoustics has been developed and implemented as a special version of the articulatory synthesizer VocalTractLab: VocalTract-Lab3D. It is open source and freely available for the speech science community. It allows to run 3D acoustic simulations (acoustic eld and transfer functions computation) of the vocal tract on a standard laptop computer and without any speci c knowledge in programming or physics simulation. In this project, the simulations performed with this software con rmed and further documented the acoustic impact of the vocal tract curvature and the ne cross-sectional area variation (e.g. the shift of resonance frequencies or additional resonances at high frequencies, above 4-5 kHz). In combination with the experimental part of the project, the impact of the vocal tract on the speech radiation pattern was furthermore highlighted. In particular, the effect of higher-order modes, which cause substantial changes of radiation patterns within small frequency intervals, was con rmed and further documented. VocalTractLab3D also provides transfer functions to synthesize stimuli in order to investigate the perceptual impact of the acoustic model (1D vs. 3D). This tool can be used in the future by the speech science community to investigate various questions requiring an accurate modeling of vocal tract acoustics, such as questions of speaker identity or the perceptual consequences of various articulatory parameters. The experimental investigation of the speech directivity mechanisms allowed us to better understand the role of various anatomical elements (torso, head, lips, vocal tract) on the radiation of speech sounds. As an example, the torso diffraction pattern was further documented, and it was found that the lips enhance this pattern. These results are valuable to build e cient speech radiation models, which can be used for auralisation of speakers in virtual reality, or with loudspeaker arrays. The perceptual studies showed that phonemes synthesized with 1D and 3D acoustic models can be perceptually discriminated and that a 3D acoustic model generates more natural sounding stimuli, at least for speci c phonemes. Thus, increasing the accuracy of the acoustic model could potentially increase the naturalness of articulatory synthesis. Furthermore, the acoustic eld visualisation tool of VocalTractLab3D facilitates the understanding of the cause of this difference by identifying transverse resonances, which potentially have a strong perceptual impact. This is promising for the development of synthesis tools with an even better tradeoff between accuracy and e ciency by targeting the perceptually relevant phenomena.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung