Learning Conversational Action Repair for Intelligent Robots

Applicants Dr.-Ing. Manfred Eppe; Professor Dr. Stefan Wermter

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term from 2019 to 2024

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 433323019

Final Report Year 2025

Final Report Abstract

What are the principal mechanisms required to capture the robustness and interactivity of human communication, given the situational, noisy and often ambiguous nature of natural language? And how, and to what extent, can we integrate these mechanisms within an embodied functional model that is computationally and empirically verifiable? We addressed these research questions by investigating the linguistic phenomenon of conversational repair (CR) -- a method to edit and re-interpret previously uttered sentences that were not correctly understood by the hearer. Previous computational models for human-robot dialog consider non-understandings, but they do not consider misunderstandings. Misunderstandings are common in natural language communication: they can result from inconsistent world models, erroneous perceptions, or ambiguous instructions. Addressing misunderstandings is important because they can cause a robot to execute unintended potentially irreversible and destructive actions. For example, given the instruction “bring me the bottle of water”, a robotic listener's vision system might confuse the water with an accidentally nearby bottle of cleaning detergent. In this case, the operator should be able to utter an interrupting repair command such as ``No, erm... stop! No, not the detergent! I mean the water, to your right!'' We refer to such commands as conversational action repair (CAR) commands. Previous dialog models for human-robot interaction did not support such commands. Our first step to address CAR was to develop a goal-conditioned reinforcement learning approach based on hindsight learning. This improved the grounding capabilities for instructionfollowing. Our surprising main result was that our new self-speech feedback method can catalyze the learning process. Our second step was to extend the self-speech-based instruction-following by action repair commands, and we found that self-speech also improves the learning process in this case. In addition to these results, we improved the Neuro-Inspired COLlaborator (NICOL), an adultsized semi-humanoid based on our established NICO robot. We integrated our new ELMiRA (Embodying Language Models in Robot Action) architecture, merging speech, vision-language, and object detection with robot-specific spatial and motion models. This integration enables human-robot interaction and object manipulation tasks. To enhance sim-to-real transfer and imitation learning, we developed neural architectures using image-to-image transfer and differentiable forward kinematics.

Link to the final report

https://doi.org/10.15480/882.15751

Publications

Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics. 2022 IEEE International Conference on Development and Learning (ICDL), 170-177. IEEE.
Röder, Frank; Eppe, Manfred & Wermter, Stefan
Intelligent problem-solving as integrated hierarchical reinforcement learning. Nature Machine Intelligence, 4(1), 11-20.
Eppe, Manfred; Gumbsch, Christian; Kerzel, Matthias; Nguyen, Phuong D. H.; Butz, Martin V. & Wermter, Stefan
Language-Conditioned Reinforcement Learning to Solve Misunderstandings with Action Corrections. Second Workshop on Language and Reinforcement Learning @ NeurIPS
Röder, F., Eppe, M.
Sim-to-Real Neural Learning with Domain Randomisation for Humanoid Robot Grasping. Lecture Notes in Computer Science, 342-354. Springer International Publishing.
Gäde, Connor; Kerzel, Matthias; Strahl, Erik & Wermter, Stefan
NICOL: A Neuro-Inspired Collaborative Semi-Humanoid Robot That Bridges Social Interaction and Reliable Manipulation. IEEE Access, 11, 123531-123542.
Kerzel, Matthias; Allgeuer, Philipp; Strahl, Erik; Frick, Nicolas; Habekost, Jan-Gerrit; Eppe, Manfred & Wermter, Stefan
Diffusing in Someone Else’s Shoes: Robotic Perspective-Taking with Diffusion. 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids), 141-148. IEEE.
Spisak, Josua; Kerzel, Matthias & Wermter, Stefan
Domain Adaption as Auxiliary Task for Sim-to-Real Transfer in Vision-based Neuro-Robotic Control. 2024 International Joint Conference on Neural Networks (IJCNN), 1-8. IEEE.
Gäde, Connor; Habekost, Jan-Gerrit & Wermter, Stefan
Embodying Language Models in Robot Action. ESANN 2024 proceesdings, 625-630. Ciaco - i6doc.com.
Gäde, Connor; Özdemir, Ozan; Weber, Cornelius & Wermter, Stefan
Inverse Kinematics for Neuro-Robotic Grasping with Humanoid Embodied Agents. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 7315-7322. IEEE.
Habekost, Jan-Gerrit; Gäde, Connor; Allgeuer, Philipp & Wermter, Stefan
Robotic Imitation of Human Actions. 2024 IEEE International Conference on Development and Learning (ICDL), 1-6. IEEE.
Spisak, Josua; Kerzel, Matthias & Wermter, Stefan
When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration. Lecture Notes in Computer Science, 306-321. Springer Nature Switzerland.
Allgeuer, Philipp; Ali, Hassan & Wermter, Stefan
Language Grounding in Deep Reinforcement Learning for Dynamic Goal-Oriented Robotics [Ph.D. thesis (in review)]. Hamburg University of Technology.
Frank Röder
Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research. SoftwareX, 29, 102064.
Benad, Jan; Röder, Frank & Eppe, Manfred

Servicenavigation

Hauptnavigation

Learning Conversational Action Repair for Intelligent Robots

Final Report Abstract

Link to the final report

Publications

Additional Information

Servicenavigation

Hauptnavigation

Learning Conversational Action Repair for Intelligent Robots

Final Report Abstract

Link to the final report

Publications

Additional Information

Textvergrößerung und Kontrastanpassung