Reinforcement Learning With Safety and Stability Guarantees During Exploration For Linear Systems

Zahra Marvi;Bahare Kiumarsi
{"title":"Reinforcement Learning With Safety and Stability Guarantees During Exploration For Linear Systems","authors":"Zahra Marvi;Bahare Kiumarsi","doi":"10.1109/OJCSYS.2022.3209945","DOIUrl":null,"url":null,"abstract":"The satisfaction of the safety and stability properties of reinforcement learning (RL) algorithms has been a long-standing challenge. These properties must be satisfied even during learning, for which exploration is required to collect rich data. However, satisfying the safety of actions when little is known about the system dynamics is a daunting challenge. After all, predicting the consequence of RL actions requires knowing the system dynamics. This paper presents a novel RL scheme that ensures the safety and stability of the linear systems during the exploration and exploitation phases. To do so, a fast and data-efficient model-learning with the convergence guarantee is employed along and simultaneously with an off-policy RL scheme to find the optimal controller. The accurate bound of the model-learning error is derived and its characteristic is employed in the formation of a novel adaptive robustified control barrier function (ARCBF) which guarantees that states of the system remain in the safe set even when the learning is incomplete. Therefore, after satisfaction of a mild rank condition, the noisy input in the exploratory data collection phase and the optimal controller in the exploitation phase are minimally altered such that the ARCBF criterion is satisfied and, therefore, safety is guaranteed in both phases. It is shown that under the proposed RL framework, the model learning error is a vanishing perturbation to the original system. Therefore, a stability guarantee is also provided even in the exploration when noisy random inputs are applied to the system.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"1 ","pages":"322-334"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9683993/09904857.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9904857/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The satisfaction of the safety and stability properties of reinforcement learning (RL) algorithms has been a long-standing challenge. These properties must be satisfied even during learning, for which exploration is required to collect rich data. However, satisfying the safety of actions when little is known about the system dynamics is a daunting challenge. After all, predicting the consequence of RL actions requires knowing the system dynamics. This paper presents a novel RL scheme that ensures the safety and stability of the linear systems during the exploration and exploitation phases. To do so, a fast and data-efficient model-learning with the convergence guarantee is employed along and simultaneously with an off-policy RL scheme to find the optimal controller. The accurate bound of the model-learning error is derived and its characteristic is employed in the formation of a novel adaptive robustified control barrier function (ARCBF) which guarantees that states of the system remain in the safe set even when the learning is incomplete. Therefore, after satisfaction of a mild rank condition, the noisy input in the exploratory data collection phase and the optimal controller in the exploitation phase are minimally altered such that the ARCBF criterion is satisfied and, therefore, safety is guaranteed in both phases. It is shown that under the proposed RL framework, the model learning error is a vanishing perturbation to the original system. Therefore, a stability guarantee is also provided even in the exploration when noisy random inputs are applied to the system.
线性系统探索过程中具有安全性和稳定性保证的强化学习
增强学习算法的安全性和稳定性一直是一个长期的挑战。即使在学习过程中,也必须满足这些特性,为此需要进行探索以收集丰富的数据。然而,在对系统动力学知之甚少的情况下,满足行动的安全性是一项艰巨的挑战。毕竟,预测RL行为的后果需要了解系统动力学。本文提出了一种新的RL方案,该方案确保了线性系统在勘探和开发阶段的安全性和稳定性。为此,在非策略RL方案的同时,采用了一种具有收敛保证的快速且数据有效的模型学习来寻找最优控制器。导出了模型学习误差的精确界,并将其特性用于形成一种新的自适应鲁棒控制屏障函数(ARCBF),该函数保证了即使在学习不完全的情况下,系统的状态也保持在安全集内。因此,在满足温和秩条件之后,探索数据收集阶段中的噪声输入和开发阶段中的最优控制器被最小程度地改变,从而满足ARCBF标准,并且因此在两个阶段中都保证了安全性。结果表明,在所提出的RL框架下,模型学习误差对原始系统是一个消失的扰动。因此,即使在将噪声随机输入应用于系统的探索中,也提供了稳定性保证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信