Learning Conversational Action Repair for Intelligent Robots
Final Report Abstract
What are the principal mechanisms required to capture the robustness and interactivity of human communication, given the situational, noisy and often ambiguous nature of natural language? And how, and to what extent, can we integrate these mechanisms within an embodied functional model that is computationally and empirically verifiable? We addressed these research questions by investigating the linguistic phenomenon of conversational repair (CR) -- a method to edit and re-interpret previously uttered sentences that were not correctly understood by the hearer. Previous computational models for human-robot dialog consider non-understandings, but they do not consider misunderstandings. Misunderstandings are common in natural language communication: they can result from inconsistent world models, erroneous perceptions, or ambiguous instructions. Addressing misunderstandings is important because they can cause a robot to execute unintended potentially irreversible and destructive actions. For example, given the instruction “bring me the bottle of water”, a robotic listener's vision system might confuse the water with an accidentally nearby bottle of cleaning detergent. In this case, the operator should be able to utter an interrupting repair command such as ``No, erm... stop! No, not the detergent! I mean the water, to your right!'' We refer to such commands as conversational action repair (CAR) commands. Previous dialog models for human-robot interaction did not support such commands. Our first step to address CAR was to develop a goal-conditioned reinforcement learning approach based on hindsight learning. This improved the grounding capabilities for instructionfollowing. Our surprising main result was that our new self-speech feedback method can catalyze the learning process. Our second step was to extend the self-speech-based instruction-following by action repair commands, and we found that self-speech also improves the learning process in this case. In addition to these results, we improved the Neuro-Inspired COLlaborator (NICOL), an adultsized semi-humanoid based on our established NICO robot. We integrated our new ELMiRA (Embodying Language Models in Robot Action) architecture, merging speech, vision-language, and object detection with robot-specific spatial and motion models. This integration enables human-robot interaction and object manipulation tasks. To enhance sim-to-real transfer and imitation learning, we developed neural architectures using image-to-image transfer and differentiable forward kinematics.
Link to the final report
https://doi.org/10.15480/882.15751
Publications
-
Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics. 2022 IEEE International Conference on Development and Learning (ICDL), 170-177. IEEE.
Röder, Frank; Eppe, Manfred & Wermter, Stefan
-
Intelligent problem-solving as integrated hierarchical reinforcement learning. Nature Machine Intelligence, 4(1), 11-20.
Eppe, Manfred; Gumbsch, Christian; Kerzel, Matthias; Nguyen, Phuong D. H.; Butz, Martin V. & Wermter, Stefan
-
Language-Conditioned Reinforcement Learning to Solve Misunderstandings with Action Corrections. Second Workshop on Language and Reinforcement Learning @ NeurIPS
Röder, F., Eppe, M.
-
Sim-to-Real Neural Learning with Domain Randomisation for Humanoid Robot Grasping. Lecture Notes in Computer Science, 342-354. Springer International Publishing.
Gäde, Connor; Kerzel, Matthias; Strahl, Erik & Wermter, Stefan
-
NICOL: A Neuro-Inspired Collaborative Semi-Humanoid Robot That Bridges Social Interaction and Reliable Manipulation. IEEE Access, 11, 123531-123542.
Kerzel, Matthias; Allgeuer, Philipp; Strahl, Erik; Frick, Nicolas; Habekost, Jan-Gerrit; Eppe, Manfred & Wermter, Stefan
-
Diffusing in Someone Else’s Shoes: Robotic Perspective-Taking with Diffusion. 2024 IEEE-RAS 23rd International Conference on Humanoid Robots (Humanoids), 141-148. IEEE.
Spisak, Josua; Kerzel, Matthias & Wermter, Stefan
-
Domain Adaption as Auxiliary Task for Sim-to-Real Transfer in Vision-based Neuro-Robotic Control. 2024 International Joint Conference on Neural Networks (IJCNN), 1-8. IEEE.
Gäde, Connor; Habekost, Jan-Gerrit & Wermter, Stefan
-
Embodying Language Models in Robot Action. ESANN 2024 proceesdings, 625-630. Ciaco - i6doc.com.
Gäde, Connor; Özdemir, Ozan; Weber, Cornelius & Wermter, Stefan
-
Inverse Kinematics for Neuro-Robotic Grasping with Humanoid Embodied Agents. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 7315-7322. IEEE.
Habekost, Jan-Gerrit; Gäde, Connor; Allgeuer, Philipp & Wermter, Stefan
-
Robotic Imitation of Human Actions. 2024 IEEE International Conference on Development and Learning (ICDL), 1-6. IEEE.
Spisak, Josua; Kerzel, Matthias & Wermter, Stefan
-
When Robots Get Chatty: Grounding Multimodal Human-Robot Conversation and Collaboration. Lecture Notes in Computer Science, 306-321. Springer Nature Switzerland.
Allgeuer, Philipp; Ali, Hassan & Wermter, Stefan
-
Language Grounding in Deep Reinforcement Learning for Dynamic Goal-Oriented Robotics [Ph.D. thesis (in review)]. Hamburg University of Technology.
Frank Röder
-
Scilab-RL: A software framework for efficient reinforcement learning and cognitive modeling research. SoftwareX, 29, 102064.
Benad, Jan; Röder, Frank & Eppe, Manfred
