Detailseite
Projekt Druckansicht

Wahrnehmung sozialer Merkmale von synthetischen Sprechern

Fachliche Zuordnung Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
Allgemeine und Vergleichende Sprachwissenschaft, Experimentelle Linguistik, Typologie, Außereuropäische Sprachen
Förderung Förderung von 2019 bis 2023
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 423651352
 
Erstellungsjahr 2023

Zusammenfassung der Projektergebnisse

Through this research project “Social Perceptions of Synthetic Speakers” we attempt to propose a workflow for the generation of socially acceptable synthetic voices. A series of subjective evaluations have displayed that the social perceptions are the underlying dimensions and can only be interpreted through a combination of adjectives. However, the use of a long list of adjectives for the evaluation of VC or TTS systems is impractical. Therefore, either the social perceptions need to interpreted through a series of evaluations on various adjective combinations or there is a dire need for the development of objective metrics for such evaluations. Correspondingly, the acoustic features contributing to various social perceptions were further analyzed and were utilized in the automatic prediction of these social perceptions from synthetic speech. However, the data used for the automatic prediction of social perceptions was collected from the subjective evaluations carried out in work package 1 and the size of the data was lower than the acoustic feature dimensions per each speech sample derived from OpenSMILE toolkit. We have utilized multiple dimensionality reduction techniques to reduce the number of dimensions while also considering the multi-collinearity between the acoustic features. Nevertheless, due to limited data size, we could only explore linear regression and Support Vector Regressors for the current experiments. Through this project, we therefore encourage the research community to explore different social perceptions of synthetic voices and also publish the evaluation results that can be used by the community for building better evaluation metrics and models for social perceptions. Further, we also show that the social perceptions are separable and also transferable from one speaker to another. This was displayed through the voice conversion experiments presented in work package 3. Furthermore, we propose to use the synthetic voices of high speech quality and naturalness as the source and target speakers for voice conversion experiments as the voices undergo the speech generation twice (TTS and VC) if using VC on TTS voices. Additionally, signal manipulation techniques can also be carried out on the TTS voices for manipulation of their social perceptions. Since, we are aware of the acoustic features contributing to various social perceptions, modifications to specific acoustic features can be carried out using tools like PRAAT. Finally, we have also explored modification of the synthesis procedure through the introduction of acoustic correlates of warmth and competence in the training mechanism of a TTS system. A linear combination of the acoustic correlates of warmth and competence was carried out in the current experiments, however, other combinations can also be explored in the future depending on the coefficient values corresponding to the features. The acoustic feature with a negative coefficient could be assigned a lower weight while the feature with positive coefficient holds a higher weight at the time of computing the weighted combinations.

Projektbezogene Publikationen (Auswahl)

 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung