Dohyeong Kim, Yunho Kim, Kyungjae Lee, Songhwai Oh
{"title":"Safety Guided Policy Optimization","authors":"Dohyeong Kim, Yunho Kim, Kyungjae Lee, Songhwai Oh","doi":"10.1109/IROS47612.2022.9981030","DOIUrl":null,"url":null,"abstract":"In reinforcement learning (RL), exploration is essential to achieve a globally optimal policy but unconstrained exploration can cause damages to robots and nearby people. To handle this safety issue in exploration, safe RL has been proposed to keep the agent under the specified safety constraints while maximizing cumulative rewards. This paper introduces a new safe RL method which can be applied to robots to operate under the safety constraints while learning. The key component of the proposed method is the safeguard module. The safeguard predicts the constraints in the near future and corrects actions such that the predicted constraints are not violated. Since actions are safely modified by the safeguard during exploration and policies are trained to imitate the corrected actions, the agent can safely explore. Additionally, the safeguard is sample efficient as it does not require long horizontal trajectories for training, so constraints can be satisfied within short time steps. The proposed method is extensively evaluated in simulation and experiments using a real robot. The results show that the proposed method achieves the best performance while satisfying safety constraints with minimal interaction with environments in all experiments.","PeriodicalId":431373,"journal":{"name":"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IROS47612.2022.9981030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In reinforcement learning (RL), exploration is essential to achieve a globally optimal policy but unconstrained exploration can cause damages to robots and nearby people. To handle this safety issue in exploration, safe RL has been proposed to keep the agent under the specified safety constraints while maximizing cumulative rewards. This paper introduces a new safe RL method which can be applied to robots to operate under the safety constraints while learning. The key component of the proposed method is the safeguard module. The safeguard predicts the constraints in the near future and corrects actions such that the predicted constraints are not violated. Since actions are safely modified by the safeguard during exploration and policies are trained to imitate the corrected actions, the agent can safely explore. Additionally, the safeguard is sample efficient as it does not require long horizontal trajectories for training, so constraints can be satisfied within short time steps. The proposed method is extensively evaluated in simulation and experiments using a real robot. The results show that the proposed method achieves the best performance while satisfying safety constraints with minimal interaction with environments in all experiments.