Automatische Erkennung von hörbar und lautlos gesprochener Sprache, basierend auf von Elektrodenarrays aufgenommenen elektromyographischen Signalen
Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
Zusammenfassung der Projektergebnisse
Silent Speech Interfaces (SSI) refer to methods, devices, and approaches which are not bound to audible speech as input signal. Rather SSI allow their users to silently communicate with each other or with machines by mouthing words without making any sound. Since SSI-based spoken communication is carried out silently, it provides several benefits. First, phone conversations or any voice-driven interaction with machines can be carried out silently in public without disturbing the surroundings. This includes settings like call centers which could turn from noisy into quiet environments. Second, private conversations or interactions like bank transactions or online shopping using PINs and passwords will no longer be eavesdropped by bystanders, making spoken communication based on SSI private and confidential. Third, SSI could provide an alternative to those individuals who lost their voice due to diseases or accidents. Here, an SSI was developed and investigated that applies surface electromyography (EMG) to recognize spoken utterances. EMG is the process of recording electrical muscle activity captured by electrodes. When a muscle fiber is activated, small electrical currents in form of ion flows are generated. These electrical currents propagate through the body tissue, whose resistance creates potential differences that are measured between regions on the body surface. The MAPS project extended a conventional single-electrodes setup placed in the face of a speaker to novel multichannel electrode grids and arrays. This new setup offers several advantages in terms of high-dimensional input signals, including but not limited to (1) greater spatial resolution, (2) robustness to position shifts, and (3) a considerably improved user-friendly device. We systematically studied the impact of sensor arrays, fundamentally improving performance and usability by innovative algorithms, and disseminated a data corpus of parallel EMG and speech data recordings. Our experimental results indicate that the usage of multichannel arrays significantly advanced EMG-based speech recognition. The created solutions are highly relevant to both advancing science in the understanding of speech production and perception in terms of muscle activity, as well as in the modeling of EMG- based speech units, thus pushing the limits of practical Silent Speech applications. We expect increasing interest in Silent Speech Interfaces and the development of novel communication devices, thereby benefitting individuals and the society at large. Since project start, our work on SSI received a best journal and best student paper award, lead to special sessions and numerous talks around the globe, and was featured in several TV and print media (e.g. BBC, Sendung mit der Maus ARD, nano 3sat). A special issue entitled “Biosignal-based Spoken Communication” in the journal of IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP) was accepted by the IEEE publication board for 2017.
Projektbezogene Publikationen (Auswahl)
-
Array-based Electromyographic Silent Speech Interface; Proceedings of the 6th International Conference on Bio-inspired Systems and Signal Processing (Biosignals), Barcelona, Spain, pages 89 – 96
Wand Michael, Schulte Christopher, Janke Matthias, Schultz Tanja
-
Artifact Removal Algorithm for an EMG-based Silent Speech Interface; Proceedings of the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan 2013, pages 5750 – 5753
Wand Michael, Himmelsbach Adam, Heistermann Till, Janke Matthias, Schultz Tanja
-
Application of Electrode Arrays for Artifact Removal in an Electromyographic Silent Speech Interface. Biomedical Engineering Systems and Technologies: Volume 452 of the series Communications in Computer and Information Science. Springer Berlin Heidelberg, pages 300 – 312
Wand Michael, Janke Matthias, Heistermann Till, Schulte Christopher, Himmelsbach Adam, and Schultz Tanja
-
Compensation of Recording Position Shifts for a Myoelectric Silent Speech Recognizer; Proceedings of the 39th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, pages 2113 – 2117
Wand Michael, Schulte Christopher, Janke Matthias, Schultz Tanja
-
Pattern Learning with Deep Neural Networks in EMG-based Speech Recognition, Proceedings of the 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Chicago, IL, USA, pages 4200 – 4203
Wand Michael, Schultz Tanja
-
Tackling Speaking Mode Varieties in EMG-based Speech Recognition. IEEE Transactions on Biomedical Engineering, Volume 61, Issue 10, pages 2515 – 2526
Wand Michael, Janke Matthias, and Schultz Tanja
-
Codebook Clustering for Unit Selection based EMG-to-Speech Conversion; Proceedings of the 16th Annual Conference of the International Speech Communication Association (Interspeech), Dresden, Germany, pages 2420 – 2424
Diener Lorenz, Janke Matthias, Schultz Tanja
-
Direct Conversion from Facial Myoelectric Signals to Speech using Deep Neural Networks. Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland
Diener Lorenz, Janke Matthias, Schultz Tanja
-
An Initial Investigation into the Real-Time Conversion of Facial Surface EMG Signals to Audible Speech Conference: 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando Florida, August 2016
Diener Lorenz, Herff Christian, Janke Matthias, Schultz Tanja
-
“Biosignal-based Spoken Communication: A Survey". IEEE/ACM Transactions on Audio, Speech, and Language Processing. Volume: 25, Issue: 12, Dec. 2017, Page(s): 2257 - 2271
Tanja Schultz ; Michael Wand ; Thomas Hueber ; Dean J. Krusienski ; Christian Herff ; Jonathan S. Brumberg