Online Reinforcement Learning Algorithm Design for Adaptive Optimal Consensus Control Under Interval Excitation

IF 8.7 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Yong Xu;Qi-Yue Che;Meng-Ying Wan;Di Mei;Zheng-Guang Wu
{"title":"Online Reinforcement Learning Algorithm Design for Adaptive Optimal Consensus Control Under Interval Excitation","authors":"Yong Xu;Qi-Yue Che;Meng-Ying Wan;Di Mei;Zheng-Guang Wu","doi":"10.1109/TSMC.2025.3583212","DOIUrl":null,"url":null,"abstract":"This article proposes online data-based reinforcement learning (RL) algorithm for adaptive output consensus control of heterogeneous multiagent systems (MASs) with unknown dynamics. First, we employ the adaptive control technique to design a distributed observer, which provides an estimation of the leader for partial agents, thereby eliminating the need for the global information. Then, we propose a novel data-based adaptive dynamic programming (ADP) approach, associated with a double-integrator operator, to develop an online data-driven learning algorithm for learning the optimal control policy. However, existing optimal control strategy learning algorithms rely on the persistent excitation conditions (PECs), the full-rank condition, and the offline storage of historical data. To address these issues, our proposed method learns the optimal control policy online by solving a data-driven linear regression equations (LREs) based on an online-verifiable interval excitation (IE) condition, instead of relying on PEC. In addition, the uniqueness of the LRE solution is established by verifying the invertibility of a matrix, instead of satisfying the full-rank condition related to PEC and historical data storage as required in existing algorithms. It is demonstrated that our proposed learning algorithm not only guarantees optimal tracking with unknown dynamics but also relaxes some of the strict conditions of existing learning algorithms. Finally, a numerical example is provided to validate the effectiveness and performance of the proposed algorithms.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7325-7334"},"PeriodicalIF":8.7000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11078394/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This article proposes online data-based reinforcement learning (RL) algorithm for adaptive output consensus control of heterogeneous multiagent systems (MASs) with unknown dynamics. First, we employ the adaptive control technique to design a distributed observer, which provides an estimation of the leader for partial agents, thereby eliminating the need for the global information. Then, we propose a novel data-based adaptive dynamic programming (ADP) approach, associated with a double-integrator operator, to develop an online data-driven learning algorithm for learning the optimal control policy. However, existing optimal control strategy learning algorithms rely on the persistent excitation conditions (PECs), the full-rank condition, and the offline storage of historical data. To address these issues, our proposed method learns the optimal control policy online by solving a data-driven linear regression equations (LREs) based on an online-verifiable interval excitation (IE) condition, instead of relying on PEC. In addition, the uniqueness of the LRE solution is established by verifying the invertibility of a matrix, instead of satisfying the full-rank condition related to PEC and historical data storage as required in existing algorithms. It is demonstrated that our proposed learning algorithm not only guarantees optimal tracking with unknown dynamics but also relaxes some of the strict conditions of existing learning algorithms. Finally, a numerical example is provided to validate the effectiveness and performance of the proposed algorithms.
区间激励下自适应最优一致控制的在线强化学习算法设计
提出了一种基于在线数据的强化学习算法,用于动态未知的异构多智能体系统的自适应输出一致性控制。首先,我们采用自适应控制技术设计了一个分布式观测器,该观测器为部分智能体提供了对领导者的估计,从而消除了对全局信息的需求。然后,我们提出了一种新的基于数据的自适应动态规划(ADP)方法,结合双积分算子,开发了一种在线数据驱动学习算法来学习最优控制策略。然而,现有的最优控制策略学习算法依赖于持续激励条件(PECs)、全秩条件和历史数据的离线存储。为了解决这些问题,我们提出的方法通过求解基于在线可验证区间激励(IE)条件的数据驱动线性回归方程(LREs)来在线学习最优控制策略,而不是依赖于PEC。此外,LRE解的唯一性是通过验证矩阵的可逆性来建立的,而不是像现有算法那样满足与PEC和历史数据存储相关的全秩条件。结果表明,本文提出的学习算法不仅保证了未知动态下的最优跟踪,而且放宽了现有学习算法的一些严格条件。最后,通过数值算例验证了所提算法的有效性和性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Systems Man Cybernetics-Systems
IEEE Transactions on Systems Man Cybernetics-Systems AUTOMATION & CONTROL SYSTEMS-COMPUTER SCIENCE, CYBERNETICS
CiteScore
18.50
自引率
11.50%
发文量
812
审稿时长
6 months
期刊介绍: The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信