Project Details
Projekt Print View

Diffusion-Based Deep Generative Models for Speech Processing

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term since 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 545210893
 
Recently a novel and very exciting generative machine learning approach has gained increasing interest in the machine learning, computer vision, and speech communities: Diffusion-based generative models, or simply diffusion models. These models are based on the idea to gradually turn data into noise (forward diffusion process), and to train a neural network that learns to invert this process for different noise scales (reverse diffusion process). The forward and backward diffusion processes have been either proposed to be modeled using Markov chains or stochastic differential equations (SDEs). We recently proposed to employ SDE-based diffusion models for speech enhancement by integrating a drift term that allows to also use recorded real-world environmental noise during training. We have shown that this generative approach is very powerful and outperforms competing discriminative approaches in cross-corpora evaluations, which highlights a very good generalization performance. However, many open questions arise that we want to tackle in this project. Our objectives are to make diffusion models capable for real-time processing with only a modest latency by reducing the memory and computational footprint. Furthermore, we will investigate novel methods to increase the robustness of diffusion models in challenging acoustic scenarios.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung