Project Details
Projekt Print View

Learning Conversational Action Repair for Intelligent Robots

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2019 to 2024
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 433323019
 
Final Report Year 2025

Final Report Abstract

What are the principal mechanisms required to capture the robustness and interactivity of human communication, given the situational, noisy and often ambiguous nature of natural language? And how, and to what extent, can we integrate these mechanisms within an embodied functional model that is computationally and empirically verifiable? We addressed these research questions by investigating the linguistic phenomenon of conversational repair (CR) -- a method to edit and re-interpret previously uttered sentences that were not correctly understood by the hearer. Previous computational models for human-robot dialog consider non-understandings, but they do not consider misunderstandings. Misunderstandings are common in natural language communication: they can result from inconsistent world models, erroneous perceptions, or ambiguous instructions. Addressing misunderstandings is important because they can cause a robot to execute unintended potentially irreversible and destructive actions. For example, given the instruction “bring me the bottle of water”, a robotic listener's vision system might confuse the water with an accidentally nearby bottle of cleaning detergent. In this case, the operator should be able to utter an interrupting repair command such as ``No, erm... stop! No, not the detergent! I mean the water, to your right!'' We refer to such commands as conversational action repair (CAR) commands. Previous dialog models for human-robot interaction did not support such commands. Our first step to address CAR was to develop a goal-conditioned reinforcement learning approach based on hindsight learning. This improved the grounding capabilities for instructionfollowing. Our surprising main result was that our new self-speech feedback method can catalyze the learning process. Our second step was to extend the self-speech-based instruction-following by action repair commands, and we found that self-speech also improves the learning process in this case. In addition to these results, we improved the Neuro-Inspired COLlaborator (NICOL), an adultsized semi-humanoid based on our established NICO robot. We integrated our new ELMiRA (Embodying Language Models in Robot Action) architecture, merging speech, vision-language, and object detection with robot-specific spatial and motion models. This integration enables human-robot interaction and object manipulation tasks. To enhance sim-to-real transfer and imitation learning, we developed neural architectures using image-to-image transfer and differentiable forward kinematics.

Link to the final report

https://doi.org/10.15480/882.15751

Publications

 
 

Additional Information

Textvergrößerung und Kontrastanpassung