安全导向的政策优化

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Pub Date : 2022-10-23 DOI:10.1109/IROS47612.2022.9981030

Dohyeong Kim, Yunho Kim, Kyungjae Lee, Songhwai Oh

{"title":"安全导向的政策优化","authors":"Dohyeong Kim, Yunho Kim, Kyungjae Lee, Songhwai Oh","doi":"10.1109/IROS47612.2022.9981030","DOIUrl":null,"url":null,"abstract":"In reinforcement learning (RL), exploration is essential to achieve a globally optimal policy but unconstrained exploration can cause damages to robots and nearby people. To handle this safety issue in exploration, safe RL has been proposed to keep the agent under the specified safety constraints while maximizing cumulative rewards. This paper introduces a new safe RL method which can be applied to robots to operate under the safety constraints while learning. The key component of the proposed method is the safeguard module. The safeguard predicts the constraints in the near future and corrects actions such that the predicted constraints are not violated. Since actions are safely modified by the safeguard during exploration and policies are trained to imitate the corrected actions, the agent can safely explore. Additionally, the safeguard is sample efficient as it does not require long horizontal trajectories for training, so constraints can be satisfied within short time steps. The proposed method is extensively evaluated in simulation and experiments using a real robot. The results show that the proposed method achieves the best performance while satisfying safety constraints with minimal interaction with environments in all experiments.","PeriodicalId":431373,"journal":{"name":"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Safety Guided Policy Optimization\",\"authors\":\"Dohyeong Kim, Yunho Kim, Kyungjae Lee, Songhwai Oh\",\"doi\":\"10.1109/IROS47612.2022.9981030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In reinforcement learning (RL), exploration is essential to achieve a globally optimal policy but unconstrained exploration can cause damages to robots and nearby people. To handle this safety issue in exploration, safe RL has been proposed to keep the agent under the specified safety constraints while maximizing cumulative rewards. This paper introduces a new safe RL method which can be applied to robots to operate under the safety constraints while learning. The key component of the proposed method is the safeguard module. The safeguard predicts the constraints in the near future and corrects actions such that the predicted constraints are not violated. Since actions are safely modified by the safeguard during exploration and policies are trained to imitate the corrected actions, the agent can safely explore. Additionally, the safeguard is sample efficient as it does not require long horizontal trajectories for training, so constraints can be satisfied within short time steps. The proposed method is extensively evaluated in simulation and experiments using a real robot. The results show that the proposed method achieves the best performance while satisfying safety constraints with minimal interaction with environments in all experiments.\",\"PeriodicalId\":431373,\"journal\":{\"name\":\"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IROS47612.2022.9981030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IROS47612.2022.9981030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在强化学习(RL)中，探索对于实现全局最优策略至关重要，但不受约束的探索可能会对机器人和附近的人造成伤害。为了解决勘探中的安全问题，安全强化学习被提出，使智能体处于指定的安全约束下，同时最大化累积奖励。本文介绍了一种新的安全强化学习方法，可以应用于机器人在安全约束下进行学习。该方法的关键组成部分是保障模块。保障措施预测近期的约束，并纠正操作，使所预测的约束不被违反。由于在探索过程中，安全卫士安全地修改了操作，并且训练策略来模仿纠正后的操作，因此代理可以安全地进行探索。此外，该保障是样本效率高的，因为它不需要很长的水平轨迹进行训练，因此可以在短时间内满足约束。该方法在实际机器人的仿真和实验中得到了广泛的评价。实验结果表明，该方法在满足安全约束的同时，与环境的交互最小，性能最佳。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Safety Guided Policy Optimization

In reinforcement learning (RL), exploration is essential to achieve a globally optimal policy but unconstrained exploration can cause damages to robots and nearby people. To handle this safety issue in exploration, safe RL has been proposed to keep the agent under the specified safety constraints while maximizing cumulative rewards. This paper introduces a new safe RL method which can be applied to robots to operate under the safety constraints while learning. The key component of the proposed method is the safeguard module. The safeguard predicts the constraints in the near future and corrects actions such that the predicted constraints are not violated. Since actions are safely modified by the safeguard during exploration and policies are trained to imitate the corrected actions, the agent can safely explore. Additionally, the safeguard is sample efficient as it does not require long horizontal trajectories for training, so constraints can be satisfied within short time steps. The proposed method is extensively evaluated in simulation and experiments using a real robot. The results show that the proposed method achieves the best performance while satisfying safety constraints with minimal interaction with environments in all experiments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

自引率

0.00%

发文量