Project Details
Speech Processing in Health Sciences
Subject Area
Computer Science
Term
since 2026
Project identifier
Deutsche Forschungsgemeinschaft (DFG) - Project number 549142762
Similar to a biomarker, speech carries information about the physiological and psychological state, which manifests itself along the whole complex speech production process Previous research has shown that, aside the obvious pathologies, neuro-degenerative diseases and psychological conditions can be reliably detected and assessed from speech, making speech processing a viable noninvasive diagnostic aid. The past years have brought significant improvements in automatic speech recognition (ASR), natural language processing (NLP) and conversational agents (CA). However, current state-of-the-art (SOTA) transformer-based ASR systems struggle with atypical speech, are often trained with biased data, and tend to produce smoothed and well-readable output rather than a verbatim version. Similarly, large language models (LLMs) are trained for task completion with a minimum of turn-taking and not to resemble a CA. This proposal aims to understand and tackle the shortcomings of current SOTA models for atypical speech, in order to design, develop, and deploy models that can be used as diagnostic aids and in health-related human-machine and human-human interaction. At the same time, this proposal aims to reduce its resource footprint and explore ways to securely handle privacysensitive data. The proposal consists of core projects that build the foundations for application-specific verticals. Verbatim rich transcription of atypical speech builds the base for diagnostic and downstream NLP tasks. Foundation models for atypical speech capture latent cues about the physical and emotional state of the speaker, to provide cues for diagnostic aids, or to augment interaction scenarios. LLMs for conversational interaction are used for the analysis and generation of natural language in ambient assisted living and online counseling. Low resource computing investigates model compression and quantization to reduce training and inference cost, and neuromorphic computing for embedded real-time computation and low power consumption. Since speech data is highly sensitive, confidential computing investigates cryptographic protocols and architectures for training and inference. The verticals explore new applications: A speech-based sleep diary as objective sleep quality assessment; conversational agents that combine improved autonomy of impaired patients with health-related monitoring; multi-modal CA for improved acceptance and success of psycho-social online counseling; embedded low-power speech synthesis as voice prosthesis; automated hermeneutic coding. The necessary clinical and field studies are centrally coordinated. In a joint effort, this proposal will implement a Universal Benchmark for Atypical Speech comprising transcription accuracy and diagnostic tasks to define a holistic evaluation for the research community.
DFG Programme
Research Impulses
Applicant Institution
Technische Hochschule Nürnberg Georg Simon Ohm
Spokesperson
Professor Dr.-Ing. Korbinian Riedhammer
Participating Researchers
Professor Dr.-Ing. Jens Albrecht; Professor Dr.-Ing. Cristian Axenie; Professorin Dr. Christina Bartenschlager; Professor Dr. Tobias Bocklet; Professor Dr. Robert Lehmann; Professor Dr. Hans Löhr; Professorin Dr. Kneginja Richter; Professor Dr. Frank Sowa; Professor Dr. Sven Winkelmann
