Breitbandige akustische Modellierung von Sprache
Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
Zusammenfassung der Projektergebnisse
In this project, an e cient simulation framework for 3D vocal tract acoustics has been developed and implemented as a special version of the articulatory synthesizer VocalTractLab: VocalTract-Lab3D. It is open source and freely available for the speech science community. It allows to run 3D acoustic simulations (acoustic eld and transfer functions computation) of the vocal tract on a standard laptop computer and without any speci c knowledge in programming or physics simulation. In this project, the simulations performed with this software con rmed and further documented the acoustic impact of the vocal tract curvature and the ne cross-sectional area variation (e.g. the shift of resonance frequencies or additional resonances at high frequencies, above 4-5 kHz). In combination with the experimental part of the project, the impact of the vocal tract on the speech radiation pattern was furthermore highlighted. In particular, the effect of higher-order modes, which cause substantial changes of radiation patterns within small frequency intervals, was con rmed and further documented. VocalTractLab3D also provides transfer functions to synthesize stimuli in order to investigate the perceptual impact of the acoustic model (1D vs. 3D). This tool can be used in the future by the speech science community to investigate various questions requiring an accurate modeling of vocal tract acoustics, such as questions of speaker identity or the perceptual consequences of various articulatory parameters. The experimental investigation of the speech directivity mechanisms allowed us to better understand the role of various anatomical elements (torso, head, lips, vocal tract) on the radiation of speech sounds. As an example, the torso diffraction pattern was further documented, and it was found that the lips enhance this pattern. These results are valuable to build e cient speech radiation models, which can be used for auralisation of speakers in virtual reality, or with loudspeaker arrays. The perceptual studies showed that phonemes synthesized with 1D and 3D acoustic models can be perceptually discriminated and that a 3D acoustic model generates more natural sounding stimuli, at least for speci c phonemes. Thus, increasing the accuracy of the acoustic model could potentially increase the naturalness of articulatory synthesis. Furthermore, the acoustic eld visualisation tool of VocalTractLab3D facilitates the understanding of the cause of this difference by identifying transverse resonances, which potentially have a strong perceptual impact. This is promising for the development of synthesis tools with an even better tradeoff between accuracy and e ciency by targeting the perceptually relevant phenomena.
Projektbezogene Publikationen (Auswahl)
-
A pilot study on the influence of mouth configuration and torso on singing voice directivity. The Journal of the Acoustical Society of America, 148(3), 1169-1180.
Brandner, Manuel; Blandin, Remi; Frank, Matthias & Sontacchi, Alois
-
Printable 3D vocal tract shapes from MRI data and their acoustic and aerodynamic properties. Scientific Data, 7(1).
Birkholz, Peter; Kürbis, Steffen; Stone, Simon; Häsner, Patrick; Blandin, Rémi & Fleischer, Mario
-
Comparison of the Finite Element Method, the Multimodal Method and the Transmission-Line Model for the Computation of Vocal Tract Transfer Functions. Interspeech 2021, 3330-3334. ISCA.
Blandin, Rémi; Arnela, Marc; Félix, Simon; Doc, Jean-Baptiste & Birkholz, Peter
-
Efficient 3D Acoustic Simulation of the Vocal Tract by Combining the Multimodal Method and Finite Elements. IEEE Access, 10, 69922-69938.
Blandin, Remi; Arnela, Marc; Felix, Simon; Doc, Jean-Baptiste & Birkholz, Peter
