Online Reinforcement Learning Algorithm Design for Adaptive Optimal Consensus Control Under Interval Excitation

IF 8.7 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Systems Man Cybernetics-Systems Pub Date : 2025-07-11 DOI:10.1109/TSMC.2025.3583212

Yong Xu;Qi-Yue Che;Meng-Ying Wan;Di Mei;Zheng-Guang Wu

{"title":"Online Reinforcement Learning Algorithm Design for Adaptive Optimal Consensus Control Under Interval Excitation","authors":"Yong Xu;Qi-Yue Che;Meng-Ying Wan;Di Mei;Zheng-Guang Wu","doi":"10.1109/TSMC.2025.3583212","DOIUrl":null,"url":null,"abstract":"This article proposes online data-based reinforcement learning (RL) algorithm for adaptive output consensus control of heterogeneous multiagent systems (MASs) with unknown dynamics. First, we employ the adaptive control technique to design a distributed observer, which provides an estimation of the leader for partial agents, thereby eliminating the need for the global information. Then, we propose a novel data-based adaptive dynamic programming (ADP) approach, associated with a double-integrator operator, to develop an online data-driven learning algorithm for learning the optimal control policy. However, existing optimal control strategy learning algorithms rely on the persistent excitation conditions (PECs), the full-rank condition, and the offline storage of historical data. To address these issues, our proposed method learns the optimal control policy online by solving a data-driven linear regression equations (LREs) based on an online-verifiable interval excitation (IE) condition, instead of relying on PEC. In addition, the uniqueness of the LRE solution is established by verifying the invertibility of a matrix, instead of satisfying the full-rank condition related to PEC and historical data storage as required in existing algorithms. It is demonstrated that our proposed learning algorithm not only guarantees optimal tracking with unknown dynamics but also relaxes some of the strict conditions of existing learning algorithms. Finally, a numerical example is provided to validate the effectiveness and performance of the proposed algorithms.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7325-7334"},"PeriodicalIF":8.7000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11078394/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This article proposes online data-based reinforcement learning (RL) algorithm for adaptive output consensus control of heterogeneous multiagent systems (MASs) with unknown dynamics. First, we employ the adaptive control technique to design a distributed observer, which provides an estimation of the leader for partial agents, thereby eliminating the need for the global information. Then, we propose a novel data-based adaptive dynamic programming (ADP) approach, associated with a double-integrator operator, to develop an online data-driven learning algorithm for learning the optimal control policy. However, existing optimal control strategy learning algorithms rely on the persistent excitation conditions (PECs), the full-rank condition, and the offline storage of historical data. To address these issues, our proposed method learns the optimal control policy online by solving a data-driven linear regression equations (LREs) based on an online-verifiable interval excitation (IE) condition, instead of relying on PEC. In addition, the uniqueness of the LRE solution is established by verifying the invertibility of a matrix, instead of satisfying the full-rank condition related to PEC and historical data storage as required in existing algorithms. It is demonstrated that our proposed learning algorithm not only guarantees optimal tracking with unknown dynamics but also relaxes some of the strict conditions of existing learning algorithms. Finally, a numerical example is provided to validate the effectiveness and performance of the proposed algorithms.

查看原文本刊更多论文

区间激励下自适应最优一致控制的在线强化学习算法设计

提出了一种基于在线数据的强化学习算法，用于动态未知的异构多智能体系统的自适应输出一致性控制。首先，我们采用自适应控制技术设计了一个分布式观测器，该观测器为部分智能体提供了对领导者的估计，从而消除了对全局信息的需求。然后，我们提出了一种新的基于数据的自适应动态规划（ADP）方法，结合双积分算子，开发了一种在线数据驱动学习算法来学习最优控制策略。然而，现有的最优控制策略学习算法依赖于持续激励条件（PECs）、全秩条件和历史数据的离线存储。为了解决这些问题，我们提出的方法通过求解基于在线可验证区间激励（IE）条件的数据驱动线性回归方程（LREs）来在线学习最优控制策略，而不是依赖于PEC。此外，LRE解的唯一性是通过验证矩阵的可逆性来建立的，而不是像现有算法那样满足与PEC和历史数据存储相关的全秩条件。结果表明，本文提出的学习算法不仅保证了未知动态下的最优跟踪，而且放宽了现有学习算法的一些严格条件。最后，通过数值算例验证了所提算法的有效性和性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Systems Man Cybernetics-Systems AUTOMATION & CONTROL SYSTEMS-COMPUTER SCIENCE, CYBERNETICS

CiteScore

18.50

自引率

11.50%

发文量

812

审稿时长

6 months

期刊介绍： The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.