Project Details
Projekt Print View

Simultaneous Interpretation of Lectures into/from German

Subject Area General and Comparative Linguistics, Experimental Linguistics, Typology, Non-European Languages
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2017 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 326928774
 
Final Report Year 2022

Final Report Abstract

Speech translation (ST) is one of the most challenging yet attractive and interesting from the application point of view. In this project, KIT addressed one of the most challenging conditions for speech translation: streaming speech translation of lectures from and to German. By spotting and tracking down its main issues (data sparsity, quality and latency of its components, domain mismatches, etc.) and then investigating and researching novel, advanced techniques to tackle those issues, we have managed to build a high-quality lecture translation system. The proposed achievements were presented at well-known, international conferences and lead to the successful participation at several international evaluation campaigns. For example, our English speech recognition achieves super-human performance for a standard test set on conversational speech with a low latency. Or our multilingual translation system pioneers the research field with the idea of making the learned common representation interlingual. In addition, the techniques were integrated into a real-world application of speech translation, the KIT lecture translator. Our speech translation framework, beside becoming a useful tool for lecturers and students, also helps us to collect more lecture data and user feedbacks, shedding the light for more research on how to leverage those kinds of data to improve lecture translation systems. We also initialize a prototype model of direct speech translation, urging the efforts to build more and larger end-to-end speech translation corpora in the community. Within the project, we also developed significant contribution to one of the most researched questions in the speech and speech translation community at the moment: The comparison between end-ot-end ASR vs hybrid ASR or end-to-end speech translation and cascaded speech translation. Thereby, the developed techniques are a valuable contribution in reducing the gap between the different approach. The most valuable lesson learned from this project is how we foresee and estimate the potential of some directions and come up with modern and advanced research along those directions. Being able to do this early enough, we can contribute greatly to the research community as well as strive to get high quality research and application.

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung