Project Details
Projekt Print View

Robot learning to perceive, plan, and act under uncertainty

Applicant Professor Jan Reinhard Peters, Ph.D., since 11/2019
Subject Area Automation, Mechatronics, Control Systems, Intelligent Technical Systems, Robotics
Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2018 to 2022
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 398611747
 
Final Report Year 2022

Final Report Abstract

This project investigated reinforcement learning (RL), a class of algorithms that learn from past interaction with a dynamic environment, for partially observable domains, i.e., settings in which the state of the environment cannot be fully observed. As an intuitive example, one can imagine navigating blindfolded through an unknown room. Sensory observations such as touching objects reveal local information about the environment, but information about the room must be actively gathered. Such partially observable systems arise naturally in different robotic scenarios. Learning solutions to partially observable tasks via reinforcement learning requires the incorporation of memory to remember past observations and dedicated planning to gather new information. Both criteria represent challenges for today’s reinforcement learning algorithms; hence, this project set out to tackle them. Throughout the project, we proposed and investigated new value propagation concepts in the context of tree search, a well-known planning algorithm that is often combined with reinforcement learning algorithms to improve performance. These approaches can improve the trade-off between the algorithm’s exploitation of found (but potentially sub-optimal) solutions and the search for alternative solutions. We also compared different memory representations for model-free deep reinforcement learning agents. We developed approaches for stabilizing the training of RL agents in the presence of challenging learning tasks, such as partially observable domains. Said approaches start by learning simpler versions of the tasks and adapting their complexity to the learning progress of the RL agent. Throughout the project, we evaluated the developed methods on different robotic tasks, one requiring a robotic arm to remove an object from a confined space through an exit at an unknown location using only collision information, similar in spirit to the initial example of blindfolded navigation.

Publications

  • A Probabilistic Interpretation of Self-Paced Learning with Application to Reinforcement Learning Journal of Machine Learning Research 22 (182), 1-52
    P. Klink, H. Abdulsamad, B. Belousov, C. D’Eramo, J. Peters & J. Pajarinen
  • Self-Paced Contextual Reinforcement Learning Conference on Robot Learning (CoRL) 2019 Conference on Robot Learning (CoRL) 2019
    P. Klink, H. Abdulsamad, B. Belousov & J. Peters
  • Generalized Mean Estimation in Monte-Carlo Tree Search International Joint Conferences on Artificial Intelligence Organization (IJCAI) 2020
    T. Dam, P. Klink, C. D’Eramo, J. Peters & J. Pajarinen
  • Self-Paced Deep Reinforcement Learning Advances in Neural Information Processing Systems (NeurIPS) 2021
    P. Klink, C. D’Eramo, J. Peters & J. Pajarinen
  • Boosted Curriculum Reinforcement Learning International Conference on Learning Representations (ICLR) 2022
    P. Klink, C. D’Eramo, J. Peters & J. Pajarinen
  • Convex regularization in Monte-Carlo tree search International Conference on Machine Learning (ICML) 2022
    T. Dam, C. D’Eramo, J. Peters & J. Pajarinen
  • Curriculum Reinforcement Learning via Constrained Optimal Transport International Conference on Machine Learning (ICML) 2022
    P. Klink, H. Yang, C. D’Eramo, J. Pajarinen & J. Peters
  • Monte-Carlo Robot Path Planning. IEEE Robotics and Automation Letters, 7(4), 11213-11220.
    Dam, Tuan; Chalvatzaki, Georgia; Peters, Jan & Pajarinen, Joni
 
 

Additional Information

Textvergrößerung und Kontrastanpassung