Project Details
Acoustics-Aware Generative and Predictive Models for Binaural Speech Extraction and Reproduction in Hearables
Applicant
Professor Dr.-Ing. Timo Gerkmann
Subject Area
Communication Technology and Networks, High-Frequency Technology and Photonic Systems, Signal Processing and Machine Learning for Information Technology
Acoustics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Acoustics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term
since 2025
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 561088164
Hearables are wearable devices such as headphones or earbuds that typically incorporate various sensors, including health monitoring sensors and multiple microphones, and offer advanced wireless connectivity, usually via Bluetooth. The market share of wireless devices in overall headphone sales is steadily increasing. Given the widespread availability of hearables, there is a strong demand for sophisticated algorithms tailored to their functionalities. This project specifically focuses on enhancing the assistive listening capabilities of hearables. An essential requirement for hearables is the preservation of binaural cues, which provide the necessary information for accurate spatial perception. Consequently, this project's objective goes beyond creating state-of-the-art speaker separation and noise reduction algorithms; it also aims to maintain the auditory spatial impression for the hearing device user. To address these research topics, we will first explore data-driven methods for source localization and tracking. Tracking the direction of arrival of sources in an acoustic environment complicates when applied to hearables, primarily due to the rapid head movements of the device wearer, which can lead to fast changes in the entire scene. Consequently, we aim to develop a fast and efficient tracking mechanism that leverages contemporary techniques from the time-series analysis domain. Secondly, accurate binaural reproduction, along with informed speaker extraction and separation, necessitates access to individualized head-related transfer functions (HRTFs). Traditional procedures for measuring HRTFs are labor-intensive and require extensive setups with specialized equipment in an anechoic chamber. Our objective is to investigate methods that allow for the measurement of HRTFs in regular room environments, minimizing the number of measurements while ensuring high quality. Ultimately, we aim to facilitate the integration of real acoustically measured individualized HRTFs into future hearables through a user-friendly procedure. Next, we will investigate two paradigms for acoustics-aware binaural speech extraction and reproduction: predictive models and generative models. While predictive methods are widely utilized for speech extraction, generative methods are gaining significant attention, especially in single-microphone scenarios. By leveraging acoustic models, we aim to adapt these data-driven predictive and generative methods for the binaural context, with an emphasis on cue preservation and enhancing spatial interpretability within the network design. We will assess the advantages and limitations of both approaches in these tasks and derive valuable insights. In addition to scientific advancements, we will build a demo platform that integrates all relevant modules to work collaboratively. All algorithms and their integration will undergo comprehensive evaluation using both public databases and data collected specifically for this project.
DFG Programme
Research Grants
International Connection
Israel
Partner Organisation
The Israel Science Foundation
Cooperation Partner
Professor Sharon Gannot, Ph.D.
