Lagrangian-based online safe reinforcement learning for state-constrained systems

IF 5.9 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Automatica Pub Date : 2025-06-20 DOI:10.1016/j.automatica.2025.112458

Soutrik Bandyopadhyay, Shubhendu Bhasin

引用次数: 0

Abstract

This paper proposes a safe reinforcement learning (RL) algorithm that approximately solves the state-constrained optimal control problem for continuous-time uncertain nonlinear systems. We formulate the safe RL problem as the minimization of a Lagrangian that includes the cost functional and a user-defined barrier Lyapunov function (BLF) encoding the state constraints. We show that the analytical solution obtained by the application of Karush–Kuhn–Tucker (KKT) conditions contains a state-dependent expression for the Lagrange multiplier, which is a function of uncertain terms in the system dynamics. We argue that a naive estimation of the Lagrange multiplier may lead to safety constraint violations. To obviate this challenge, we propose an Actor–Critic–Identifier–Lagrangian (ACIL) algorithm that learns optimal control policies from online data without compromising safety. We provide safety and boundedness guarantees with the proposed algorithm and compare its performance with existing offline/online RL methods via a simulation study.

查看原文本刊更多论文

基于拉格朗日的状态约束系统在线安全强化学习

提出了一种近似解决连续时间不确定非线性系统状态约束最优控制问题的安全强化学习算法。我们将安全RL问题表述为包含代价函数和编码状态约束的用户定义屏障李雅普诺夫函数（BLF）的拉格朗日函数的最小化。我们证明了应用Karush-Kuhn-Tucker （KKT）条件得到的解析解包含拉格朗日乘子的状态相关表达式，拉格朗日乘子是系统动力学中不确定项的函数。我们认为对拉格朗日乘子的朴素估计可能导致违反安全约束。为了避免这一挑战，我们提出了一种actor - critical - identifier - lagrange （ACIL）算法，该算法可以在不影响安全性的情况下从在线数据中学习最优控制策略。我们提供了安全性和有界性保证，并通过仿真研究将其性能与现有的离线/在线RL方法进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automatica 工程技术-工程：电子与电气

CiteScore

10.70

自引率

7.80%

发文量

617

审稿时长

5 months

期刊介绍： Automatica is a leading archival publication in the field of systems and control. The field encompasses today a broad set of areas and topics, and is thriving not only within itself but also in terms of its impact on other fields, such as communications, computers, biology, energy and economics. Since its inception in 1963, Automatica has kept abreast with the evolution of the field over the years, and has emerged as a leading publication driving the trends in the field. After being founded in 1963, Automatica became a journal of the International Federation of Automatic Control (IFAC) in 1969. It features a characteristic blend of theoretical and applied papers of archival, lasting value, reporting cutting edge research results by authors across the globe. It features articles in distinct categories, including regular, brief and survey papers, technical communiqués, correspondence items, as well as reviews on published books of interest to the readership. It occasionally publishes special issues on emerging new topics or established mature topics of interest to a broad audience. Automatica solicits original high-quality contributions in all the categories listed above, and in all areas of systems and control interpreted in a broad sense and evolving constantly. They may be submitted directly to a subject editor or to the Editor-in-Chief if not sure about the subject area. Editorial procedures in place assure careful, fair, and prompt handling of all submitted articles. Accepted papers appear in the journal in the shortest time feasible given production time constraints.