基于强化学习的DoS攻击下多智能体系统规定时间人在环最优同步控制。

IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zongsheng Huang,Tieshan Li,Yue Long,Hongjing Liang
{"title":"基于强化学习的DoS攻击下多智能体系统规定时间人在环最优同步控制。","authors":"Zongsheng Huang,Tieshan Li,Yue Long,Hongjing Liang","doi":"10.1109/tnnls.2025.3583248","DOIUrl":null,"url":null,"abstract":"The prescribed-time (PT) human-in-the-loop (HiTL) optimal synchronization control problem for multiagent systems (MASs) under link-based denial-of-service (DoS) attacks is investigated. First, the HiTL framework enables the human operator to govern the MASs by transmitting commands to the leader. The link-based DoS attacks cause communication blockages between agents, resulting in topology switching. Under the switching communication topology, a fully distributed observer is proposed for each follower, which simultaneously integrates a prescribed finite-time function to estimate the leader's output within the PT. This observer is characterized by a bounded gain at the PT point and guarantees global practical PT convergence, while avoiding the use of global topology information. By combining the follower dynamics with the proposed observer, an augmented system is developed. Subsequently, the model-free Q-learning algorithm is used to learn the optimal synchronization policy directly from real system data. To reduce computational burden, the Q-learning algorithm is implemented using a single critic neural network (NN) structure, with the least-squares method applied to train the NN weights. The convergence of the Q-functions generated by the proposed Q-learning algorithm is proven. Finally, simulation results verify the effectiveness of the proposed control scheme.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"41 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prescribed-Time Human-in-the-Loop Optimal Synchronization Control for Multiagent Systems Under DoS Attacks via Reinforcement Learning.\",\"authors\":\"Zongsheng Huang,Tieshan Li,Yue Long,Hongjing Liang\",\"doi\":\"10.1109/tnnls.2025.3583248\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The prescribed-time (PT) human-in-the-loop (HiTL) optimal synchronization control problem for multiagent systems (MASs) under link-based denial-of-service (DoS) attacks is investigated. First, the HiTL framework enables the human operator to govern the MASs by transmitting commands to the leader. The link-based DoS attacks cause communication blockages between agents, resulting in topology switching. Under the switching communication topology, a fully distributed observer is proposed for each follower, which simultaneously integrates a prescribed finite-time function to estimate the leader's output within the PT. This observer is characterized by a bounded gain at the PT point and guarantees global practical PT convergence, while avoiding the use of global topology information. By combining the follower dynamics with the proposed observer, an augmented system is developed. Subsequently, the model-free Q-learning algorithm is used to learn the optimal synchronization policy directly from real system data. To reduce computational burden, the Q-learning algorithm is implemented using a single critic neural network (NN) structure, with the least-squares method applied to train the NN weights. The convergence of the Q-functions generated by the proposed Q-learning algorithm is proven. Finally, simulation results verify the effectiveness of the proposed control scheme.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tnnls.2025.3583248\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3583248","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

研究了基于链路的拒绝服务(DoS)攻击下多智能体系统(MASs)的规定时间(PT)人在环(HiTL)最优同步控制问题。首先,HiTL框架使人类操作员能够通过向领导者发送命令来管理MASs。基于链路的DoS攻击会导致agent间通信阻塞,导致拓扑切换。在交换通信拓扑下,为每一个follower提出了一个全分布式观测器,该观测器同时集成了一个规定的有限时间函数来估计leader在PT内的输出。该观测器在PT点具有有界增益,保证了全局实际PT收敛,同时避免了使用全局拓扑信息。通过将follower动力学与所提出的观测器相结合,建立了一个增强系统。随后,采用无模型Q-learning算法直接从系统实际数据中学习最优同步策略。为了减少计算量,Q-learning算法使用单个批评神经网络(NN)结构实现,并使用最小二乘法来训练NN权值。证明了所提出的q -学习算法生成的q -函数的收敛性。最后,仿真结果验证了所提控制方案的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Prescribed-Time Human-in-the-Loop Optimal Synchronization Control for Multiagent Systems Under DoS Attacks via Reinforcement Learning.
The prescribed-time (PT) human-in-the-loop (HiTL) optimal synchronization control problem for multiagent systems (MASs) under link-based denial-of-service (DoS) attacks is investigated. First, the HiTL framework enables the human operator to govern the MASs by transmitting commands to the leader. The link-based DoS attacks cause communication blockages between agents, resulting in topology switching. Under the switching communication topology, a fully distributed observer is proposed for each follower, which simultaneously integrates a prescribed finite-time function to estimate the leader's output within the PT. This observer is characterized by a bounded gain at the PT point and guarantees global practical PT convergence, while avoiding the use of global topology information. By combining the follower dynamics with the proposed observer, an augmented system is developed. Subsequently, the model-free Q-learning algorithm is used to learn the optimal synchronization policy directly from real system data. To reduce computational burden, the Q-learning algorithm is implemented using a single critic neural network (NN) structure, with the least-squares method applied to train the NN weights. The convergence of the Q-functions generated by the proposed Q-learning algorithm is proven. Finally, simulation results verify the effectiveness of the proposed control scheme.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信