安全导向的政策优化

Dohyeong Kim, Yunho Kim, Kyungjae Lee, Songhwai Oh
{"title":"安全导向的政策优化","authors":"Dohyeong Kim, Yunho Kim, Kyungjae Lee, Songhwai Oh","doi":"10.1109/IROS47612.2022.9981030","DOIUrl":null,"url":null,"abstract":"In reinforcement learning (RL), exploration is essential to achieve a globally optimal policy but unconstrained exploration can cause damages to robots and nearby people. To handle this safety issue in exploration, safe RL has been proposed to keep the agent under the specified safety constraints while maximizing cumulative rewards. This paper introduces a new safe RL method which can be applied to robots to operate under the safety constraints while learning. The key component of the proposed method is the safeguard module. The safeguard predicts the constraints in the near future and corrects actions such that the predicted constraints are not violated. Since actions are safely modified by the safeguard during exploration and policies are trained to imitate the corrected actions, the agent can safely explore. Additionally, the safeguard is sample efficient as it does not require long horizontal trajectories for training, so constraints can be satisfied within short time steps. The proposed method is extensively evaluated in simulation and experiments using a real robot. The results show that the proposed method achieves the best performance while satisfying safety constraints with minimal interaction with environments in all experiments.","PeriodicalId":431373,"journal":{"name":"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Safety Guided Policy Optimization\",\"authors\":\"Dohyeong Kim, Yunho Kim, Kyungjae Lee, Songhwai Oh\",\"doi\":\"10.1109/IROS47612.2022.9981030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In reinforcement learning (RL), exploration is essential to achieve a globally optimal policy but unconstrained exploration can cause damages to robots and nearby people. To handle this safety issue in exploration, safe RL has been proposed to keep the agent under the specified safety constraints while maximizing cumulative rewards. This paper introduces a new safe RL method which can be applied to robots to operate under the safety constraints while learning. The key component of the proposed method is the safeguard module. The safeguard predicts the constraints in the near future and corrects actions such that the predicted constraints are not violated. Since actions are safely modified by the safeguard during exploration and policies are trained to imitate the corrected actions, the agent can safely explore. Additionally, the safeguard is sample efficient as it does not require long horizontal trajectories for training, so constraints can be satisfied within short time steps. The proposed method is extensively evaluated in simulation and experiments using a real robot. The results show that the proposed method achieves the best performance while satisfying safety constraints with minimal interaction with environments in all experiments.\",\"PeriodicalId\":431373,\"journal\":{\"name\":\"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IROS47612.2022.9981030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IROS47612.2022.9981030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在强化学习(RL)中,探索对于实现全局最优策略至关重要,但不受约束的探索可能会对机器人和附近的人造成伤害。为了解决勘探中的安全问题,安全强化学习被提出,使智能体处于指定的安全约束下,同时最大化累积奖励。本文介绍了一种新的安全强化学习方法,可以应用于机器人在安全约束下进行学习。该方法的关键组成部分是保障模块。保障措施预测近期的约束,并纠正操作,使所预测的约束不被违反。由于在探索过程中,安全卫士安全地修改了操作,并且训练策略来模仿纠正后的操作,因此代理可以安全地进行探索。此外,该保障是样本效率高的,因为它不需要很长的水平轨迹进行训练,因此可以在短时间内满足约束。该方法在实际机器人的仿真和实验中得到了广泛的评价。实验结果表明,该方法在满足安全约束的同时,与环境的交互最小,性能最佳。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Safety Guided Policy Optimization
In reinforcement learning (RL), exploration is essential to achieve a globally optimal policy but unconstrained exploration can cause damages to robots and nearby people. To handle this safety issue in exploration, safe RL has been proposed to keep the agent under the specified safety constraints while maximizing cumulative rewards. This paper introduces a new safe RL method which can be applied to robots to operate under the safety constraints while learning. The key component of the proposed method is the safeguard module. The safeguard predicts the constraints in the near future and corrects actions such that the predicted constraints are not violated. Since actions are safely modified by the safeguard during exploration and policies are trained to imitate the corrected actions, the agent can safely explore. Additionally, the safeguard is sample efficient as it does not require long horizontal trajectories for training, so constraints can be satisfied within short time steps. The proposed method is extensively evaluated in simulation and experiments using a real robot. The results show that the proposed method achieves the best performance while satisfying safety constraints with minimal interaction with environments in all experiments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信