移动障碍物面前机器人轨迹的安全强化学习

IF 4.6 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2024-10-30 DOI:10.1109/LRA.2024.3488402

Jonas Kiemel;Ludovic Righetti;Torsten Kröger;Tamim Asfour

{"title":"移动障碍物面前机器人轨迹的安全强化学习","authors":"Jonas Kiemel;Ludovic Righetti;Torsten Kröger;Tamim Asfour","doi":"10.1109/LRA.2024.3488402","DOIUrl":null,"url":null,"abstract":"In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"9 12","pages":"11353-11360"},"PeriodicalIF":4.6000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles\",\"authors\":\"Jonas Kiemel;Ludovic Righetti;Torsten Kröger;Tamim Asfour\",\"doi\":\"10.1109/LRA.2024.3488402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"9 12\",\"pages\":\"11353-11360\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10738380/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10738380/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们提出了一种在移动障碍物面前学习无碰撞机器人轨迹的方法。第一步，我们利用无模型强化学习技术训练一种后备策略，以便从任意的机器人初始状态生成规避动作。在为其他任务学习策略时，后备策略可用于估计碰撞的潜在风险，并在估计风险过高时提供替代行动。无论选择哪种行动，我们的行动空间都能确保不违反机器人关节的运动学极限。我们分析并评估了估算碰撞风险的两种不同方法。在后台进行的物理模拟计算成本高昂，但在确定性环境中却能提供最佳结果。如果改用基于数据的风险估算器，计算量会显著减少，但会引入额外的误差源。为了进行评估，我们成功地学习了一项伸手任务和一项篮球任务，同时保持了较低的碰撞风险。结果表明，我们的方法在确定性和随机性环境中都很有效，包括人机场景和球类环境，在这些环境中，没有任何状态可以被认为是永久安全的。通过使用真实机器人进行实验，我们证明我们的方法可以实时生成安全轨迹。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Safe Reinforcement Learning of Robot Trajectories in the Presence of Moving Obstacles

In this paper, we present an approach for learning collision-free robot trajectories in the presence of moving obstacles. As a first step, we train a backup policy to generate evasive movements from arbitrary initial robot states using model-free reinforcement learning. When learning policies for other tasks, the backup policy can be used to estimate the potential risk of a collision and to offer an alternative action if the estimated risk is considered too high. No matter which action is selected, our action space ensures that the kinematic limits of the robot joints are not violated. We analyze and evaluate two different methods for estimating the risk of a collision. A physics simulation performed in the background is computationally expensive but provides the best results in deterministic environments. If a data-based risk estimator is used instead, the computational effort is significantly reduced, but an additional source of error is introduced. For evaluation, we successfully learn a reaching task and a basketball task while keeping the risk of collisions low. The results demonstrate the effectiveness of our approach for deterministic and stochastic environments, including a human-robot scenario and a ball environment, where no state can be considered permanently safe. By conducting experiments with a real robot, we show that our approach can generate safe trajectories in real time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.