Safe Exploration Method for Reinforcement Learning under Existence of Disturbance

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference) Pub Date : 2022-09-30 DOI:10.48550/arXiv.2209.15452

Y. Okawa, Tomotake Sasaki, H. Yanami, T. Namerikawa

{"title":"Safe Exploration Method for Reinforcement Learning under Existence of Disturbance","authors":"Y. Okawa, Tomotake Sasaki, H. Yanami, T. Namerikawa","doi":"10.48550/arXiv.2209.15452","DOIUrl":null,"url":null,"abstract":"Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.","PeriodicalId":74091,"journal":{"name":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","volume":"19 1","pages":"132-147"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2209.15452","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Recent rapid developments in reinforcement learning algorithms have been giving us novel possibilities in many fields. However, due to their exploring property, we have to take the risk into consideration when we apply those algorithms to safety-critical problems especially in real environments. In this study, we deal with a safe exploration problem in reinforcement learning under the existence of disturbance. We define the safety during learning as satisfaction of the constraint conditions explicitly defined in terms of the state and propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance. The proposed method assures the satisfaction of the explicit state constraints with a pre-specified probability even if the controlled object is exposed to a stochastic disturbance following a normal distribution. As theoretical results, we introduce sufficient conditions to construct conservative inputs not containing an exploring aspect used in the proposed method and prove that the safety in the above explained sense is guaranteed with the proposed method. Furthermore, we illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.

查看原文本刊更多论文

干扰存在下强化学习的安全探索方法

近年来，强化学习算法的快速发展为我们在许多领域提供了新的可能性。然而，由于它们的探索性，当我们将这些算法应用于安全关键问题时，特别是在现实环境中，我们必须考虑到风险。在本研究中，我们处理了一个存在干扰的强化学习中的安全探索问题。我们将学习过程中的安全性定义为满足以状态明确定义的约束条件，并提出了一种利用被控对象和干扰的部分先验知识的安全探索方法。该方法即使被控对象受到服从正态分布的随机扰动，也能保证以预先指定的概率满足显式状态约束。作为理论结果，我们引入了构造保守输入的充分条件，该输入不包含所提方法中使用的探索方面，并证明了所提方法保证了上述解释意义上的安全性。通过倒立摆和四杆并联机器人的数值仿真，验证了所提方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine learning and knowledge discovery in databases : European Conference, ECML PKDD ... : proceedings. ECML PKDD (Conference)

自引率

0.00%

发文量