Project Details
Projekt Print View

Quality Prediction for Speech Processed or Generated with Machine Learning Techniques

Subject Area Communication Technology and Networks, High-Frequency Technology and Photonic Systems, Signal Processing and Machine Learning for Information Technology
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Methods in Artificial Intelligence and Machine Learning
Term since 2025
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 558877568
 
Speech signals are increasingly processed or generated by machine learning (ML) algorithms. Whereas ML-based algorithms frequently outperform traditional ones in coding, enhancement, speaker transformation and anonymization, as well as synthesis tasks, it is unclear how the perceptual effects introduced by the algorithms affect human-perceived quality. ML-based algorithms are commonly optimized via proxies characterizing performance, but not human-perceived quality. The reason is a lack of quality prediction models which provide valid estimations for ML-processed or generated speech signals. It is the aim of this research project to fill this gap by developing an open-source model which predicts speech quality and its underlying perceptual dimensions for a range of ML-based processing and generation algorithms. A database is created which is representative of state-of-the-art algorithms applied on German and English source files. In a first step, perceptual dimensions underlying perceived speech quality are determined via expert and crowdsourcing listening tests, using a Semantic Differential approach. In a second step, each sample of the database is rated regarding overall quality and each perceptual quality dimension via additional crowdsourcing tests. The obtained results form target values for a ML-based model which predicts both perceptual dimensions and overall quality, on the basis of the individual speech signals. The model will make use of pre-trained generative models which need to be adapted to the quality prediction task, and the effect of model architecture and finetuning as well as transfer learning will be analyzed to increase model validity and reliability. The model will finally be evaluated on a new database reflecting ML-based algorithms which are not yet available at the start of the project, in order to test its generalizability. The obtained databases and models will be made available open-source to the scientific community as well as to international standardization.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung