学习未知非线性控制系统的最优安全证书

IF 1.8 Q3 AUTOMATION & CONTROL SYSTEMS

IFAC Journal of Systems and Control Pub Date : 2025-03-01 DOI:10.1016/j.ifacsc.2025.100300

Pouria Tooranjipour, Bahare Kiumarsi

{"title":"学习未知非线性控制系统的最优安全证书","authors":"Pouria Tooranjipour, Bahare Kiumarsi","doi":"10.1016/j.ifacsc.2025.100300","DOIUrl":null,"url":null,"abstract":"<div><div>This paper introduces a novel approach for designing safe optimal controllers that avoid destructive conflicts between safety and performance in a large domain of the system’s operation. Designing computationally tractable feedback controllers that respect safety for a given set is impossible in general. The best one can do in this case is to maximize the region contained in the safe set that respects both safety and optimality. To this end, our key contribution lies in constructing a safe optimal domain of attraction (DoA) that ensures optimal convergence of the system’s trajectories to the origin without violating safety. To accomplish this, we leverage the concept of the relaxed Hamilton–Jacobi–Bellman (HJB) equation, which allows us to learn the most permissive control barrier certificates (CBCs) with a maximum-volume conflict-free set by solving a tractable optimization problem. To enhance computational efficiency, we present an innovative sum-of-squares (SOS)-based algorithm, breaking down the optimization problem into smaller SOS programs at each iteration. To alleviate the need for the system model to solve these SOS optimizations, an SOS-based off-policy reinforcement learning (RL) method is presented. This off-policy learning approach enables the evaluation of a target policy distinct from the behavior policy used for data collection, ensuring safe exploration under mild assumptions. In the end, the simulation results are given to show the efficacy of the proposed method.</div></div>","PeriodicalId":29926,"journal":{"name":"IFAC Journal of Systems and Control","volume":"31 ","pages":"Article 100300"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning optimal safety certificates for unknown nonlinear control systems\",\"authors\":\"Pouria Tooranjipour, Bahare Kiumarsi\",\"doi\":\"10.1016/j.ifacsc.2025.100300\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper introduces a novel approach for designing safe optimal controllers that avoid destructive conflicts between safety and performance in a large domain of the system’s operation. Designing computationally tractable feedback controllers that respect safety for a given set is impossible in general. The best one can do in this case is to maximize the region contained in the safe set that respects both safety and optimality. To this end, our key contribution lies in constructing a safe optimal domain of attraction (DoA) that ensures optimal convergence of the system’s trajectories to the origin without violating safety. To accomplish this, we leverage the concept of the relaxed Hamilton–Jacobi–Bellman (HJB) equation, which allows us to learn the most permissive control barrier certificates (CBCs) with a maximum-volume conflict-free set by solving a tractable optimization problem. To enhance computational efficiency, we present an innovative sum-of-squares (SOS)-based algorithm, breaking down the optimization problem into smaller SOS programs at each iteration. To alleviate the need for the system model to solve these SOS optimizations, an SOS-based off-policy reinforcement learning (RL) method is presented. This off-policy learning approach enables the evaluation of a target policy distinct from the behavior policy used for data collection, ensuring safe exploration under mild assumptions. In the end, the simulation results are given to show the efficacy of the proposed method.</div></div>\",\"PeriodicalId\":29926,\"journal\":{\"name\":\"IFAC Journal of Systems and Control\",\"volume\":\"31 \",\"pages\":\"Article 100300\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IFAC Journal of Systems and Control\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468601825000069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IFAC Journal of Systems and Control","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468601825000069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了一种设计安全最优控制器的新方法，该方法在系统运行的大范围内避免了安全与性能之间的破坏性冲突。一般来说，设计出计算上可处理的反馈控制器来保证给定集合的安全性是不可能的。在这种情况下，我们能做的最好的事情就是最大化安全集中包含的区域，同时尊重安全性和最优性。为此，我们的关键贡献在于构建一个安全的最优吸引域（DoA），以确保系统轨迹在不违反安全性的情况下最优收敛到原点。为了实现这一目标，我们利用了松弛Hamilton-Jacobi-Bellman （HJB）方程的概念，该方程允许我们通过解决一个可处理的优化问题来学习具有最大容量无冲突集的最宽松控制屏障证书（CBCs）。为了提高计算效率，我们提出了一种创新的基于平方和（SOS）的算法，在每次迭代中将优化问题分解为更小的SOS程序。为了减轻系统模型解决这些SOS优化问题的需要，提出了一种基于SOS的非策略强化学习（RL）方法。这种非策略学习方法使目标策略的评估不同于用于数据收集的行为策略，确保在温和假设下的安全探索。最后给出了仿真结果，验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning optimal safety certificates for unknown nonlinear control systems

This paper introduces a novel approach for designing safe optimal controllers that avoid destructive conflicts between safety and performance in a large domain of the system’s operation. Designing computationally tractable feedback controllers that respect safety for a given set is impossible in general. The best one can do in this case is to maximize the region contained in the safe set that respects both safety and optimality. To this end, our key contribution lies in constructing a safe optimal domain of attraction (DoA) that ensures optimal convergence of the system’s trajectories to the origin without violating safety. To accomplish this, we leverage the concept of the relaxed Hamilton–Jacobi–Bellman (HJB) equation, which allows us to learn the most permissive control barrier certificates (CBCs) with a maximum-volume conflict-free set by solving a tractable optimization problem. To enhance computational efficiency, we present an innovative sum-of-squares (SOS)-based algorithm, breaking down the optimization problem into smaller SOS programs at each iteration. To alleviate the need for the system model to solve these SOS optimizations, an SOS-based off-policy reinforcement learning (RL) method is presented. This off-policy learning approach enables the evaluation of a target policy distinct from the behavior policy used for data collection, ensuring safe exploration under mild assumptions. In the end, the simulation results are given to show the efficacy of the proposed method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IFAC Journal of Systems and Control AUTOMATION & CONTROL SYSTEMS-

CiteScore

3.70

自引率

5.30%

发文量