Adaptive Safety-Certified Reinforcement Learning for Constrained Optimal Control of Autonomous Robots With Uncertainties

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Internet of Things Journal Pub Date : 2025-04-02 DOI:10.1109/JIOT.2025.3554521

Fei Zhang;Guang-Hong Yang

{"title":"Adaptive Safety-Certified Reinforcement Learning for Constrained Optimal Control of Autonomous Robots With Uncertainties","authors":"Fei Zhang;Guang-Hong Yang","doi":"10.1109/JIOT.2025.3554521","DOIUrl":null,"url":null,"abstract":"This article investigates a constrained optimal control problem for safety-critical robots with parametric uncertainties. A novel adaptive safety-certified reinforcement learning (RL) algorithm is proposed, leveraging control barrier functions (CBFs) to enable safe learning of the optimal policy during the online exploration phase. Specifically, a high-order robust adaptive CBF is presented to minimally adjust RL-derived control actions by incorporating a prescribed-time adaptation law to handle the unknown system parameters. This way directly enforces forward invariance, allowing the shrunken safe set to near the standard set within a user-prescribed time. Moreover, a novel adaptive critic learning frame is presented by introducing filtered auxiliary signals that integrate both instantaneous and historical data, which relaxes the strict persistent excitation (PE) condition required in the existing RL methods to a weaker, easily verifiable finite excitation (FE) condition. Later, a prescribed-time learning rule is developed to accelerate the convergence of weights. The key advantage of the proposed way is the decoupling of safety and RL convergence, enabling each component to be managed separately, thereby offering stronger safety certifications compared to the existing RL schemes even under uncertain dynamics. The effectiveness and superiority of the proposed scheme are proven via simulations for surveillance and regulation tasks of autonomous robots.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":"12 13","pages":"23154-23168"},"PeriodicalIF":8.9000,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10947350/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This article investigates a constrained optimal control problem for safety-critical robots with parametric uncertainties. A novel adaptive safety-certified reinforcement learning (RL) algorithm is proposed, leveraging control barrier functions (CBFs) to enable safe learning of the optimal policy during the online exploration phase. Specifically, a high-order robust adaptive CBF is presented to minimally adjust RL-derived control actions by incorporating a prescribed-time adaptation law to handle the unknown system parameters. This way directly enforces forward invariance, allowing the shrunken safe set to near the standard set within a user-prescribed time. Moreover, a novel adaptive critic learning frame is presented by introducing filtered auxiliary signals that integrate both instantaneous and historical data, which relaxes the strict persistent excitation (PE) condition required in the existing RL methods to a weaker, easily verifiable finite excitation (FE) condition. Later, a prescribed-time learning rule is developed to accelerate the convergence of weights. The key advantage of the proposed way is the decoupling of safety and RL convergence, enabling each component to be managed separately, thereby offering stronger safety certifications compared to the existing RL schemes even under uncertain dynamics. The effectiveness and superiority of the proposed scheme are proven via simulations for surveillance and regulation tasks of autonomous robots.

查看原文本刊更多论文

不确定自主机器人约束最优控制的自适应安全认证强化学习

研究了具有参数不确定性的安全关键型机器人的约束最优控制问题。提出了一种新的自适应安全认证强化学习（RL）算法，利用控制障碍函数（cbf）实现在线探索阶段最优策略的安全学习。具体而言，提出了一种高阶鲁棒自适应CBF，通过结合规定时间自适应律来处理未知系统参数，对rl衍生的控制动作进行最小程度的调整。这种方式直接加强了前向不变性，允许缩小的安全集在用户规定的时间内接近标准集。此外，通过引入整合瞬时和历史数据的滤波辅助信号，提出了一种新的自适应批评学习框架，将现有强化学习方法中严格的持续激励（PE）条件放宽为较弱、易于验证的有限激励（FE）条件。在此基础上，提出了一种规定时间的学习规则来加快权重的收敛速度。该方法的主要优点是将安全性和RL收敛解耦，使每个组件能够单独管理，因此即使在不确定的动态情况下，也比现有的RL方案提供更强的安全性认证。通过对自主机器人监控任务的仿真，验证了该方案的有效性和优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Internet of Things Journal Computer Science-Information Systems

CiteScore

17.60

自引率

13.20%

发文量

1982

期刊介绍： The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.