基于强化学习的DoS攻击下多智能体系统规定时间人在环最优同步控制。

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-07-08 DOI:10.1109/tnnls.2025.3583248

Zongsheng Huang,Tieshan Li,Yue Long,Hongjing Liang

{"title":"基于强化学习的DoS攻击下多智能体系统规定时间人在环最优同步控制。","authors":"Zongsheng Huang,Tieshan Li,Yue Long,Hongjing Liang","doi":"10.1109/tnnls.2025.3583248","DOIUrl":null,"url":null,"abstract":"The prescribed-time (PT) human-in-the-loop (HiTL) optimal synchronization control problem for multiagent systems (MASs) under link-based denial-of-service (DoS) attacks is investigated. First, the HiTL framework enables the human operator to govern the MASs by transmitting commands to the leader. The link-based DoS attacks cause communication blockages between agents, resulting in topology switching. Under the switching communication topology, a fully distributed observer is proposed for each follower, which simultaneously integrates a prescribed finite-time function to estimate the leader's output within the PT. This observer is characterized by a bounded gain at the PT point and guarantees global practical PT convergence, while avoiding the use of global topology information. By combining the follower dynamics with the proposed observer, an augmented system is developed. Subsequently, the model-free Q-learning algorithm is used to learn the optimal synchronization policy directly from real system data. To reduce computational burden, the Q-learning algorithm is implemented using a single critic neural network (NN) structure, with the least-squares method applied to train the NN weights. The convergence of the Q-functions generated by the proposed Q-learning algorithm is proven. Finally, simulation results verify the effectiveness of the proposed control scheme.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"41 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prescribed-Time Human-in-the-Loop Optimal Synchronization Control for Multiagent Systems Under DoS Attacks via Reinforcement Learning.\",\"authors\":\"Zongsheng Huang,Tieshan Li,Yue Long,Hongjing Liang\",\"doi\":\"10.1109/tnnls.2025.3583248\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The prescribed-time (PT) human-in-the-loop (HiTL) optimal synchronization control problem for multiagent systems (MASs) under link-based denial-of-service (DoS) attacks is investigated. First, the HiTL framework enables the human operator to govern the MASs by transmitting commands to the leader. The link-based DoS attacks cause communication blockages between agents, resulting in topology switching. Under the switching communication topology, a fully distributed observer is proposed for each follower, which simultaneously integrates a prescribed finite-time function to estimate the leader's output within the PT. This observer is characterized by a bounded gain at the PT point and guarantees global practical PT convergence, while avoiding the use of global topology information. By combining the follower dynamics with the proposed observer, an augmented system is developed. Subsequently, the model-free Q-learning algorithm is used to learn the optimal synchronization policy directly from real system data. To reduce computational burden, the Q-learning algorithm is implemented using a single critic neural network (NN) structure, with the least-squares method applied to train the NN weights. The convergence of the Q-functions generated by the proposed Q-learning algorithm is proven. Finally, simulation results verify the effectiveness of the proposed control scheme.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tnnls.2025.3583248\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3583248","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

研究了基于链路的拒绝服务（DoS）攻击下多智能体系统（MASs）的规定时间（PT）人在环（HiTL）最优同步控制问题。首先，HiTL框架使人类操作员能够通过向领导者发送命令来管理MASs。基于链路的DoS攻击会导致agent间通信阻塞，导致拓扑切换。在交换通信拓扑下，为每一个follower提出了一个全分布式观测器，该观测器同时集成了一个规定的有限时间函数来估计leader在PT内的输出。该观测器在PT点具有有界增益，保证了全局实际PT收敛，同时避免了使用全局拓扑信息。通过将follower动力学与所提出的观测器相结合，建立了一个增强系统。随后，采用无模型Q-learning算法直接从系统实际数据中学习最优同步策略。为了减少计算量，Q-learning算法使用单个批评神经网络（NN）结构实现，并使用最小二乘法来训练NN权值。证明了所提出的q -学习算法生成的q -函数的收敛性。最后，仿真结果验证了所提控制方案的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Prescribed-Time Human-in-the-Loop Optimal Synchronization Control for Multiagent Systems Under DoS Attacks via Reinforcement Learning.

The prescribed-time (PT) human-in-the-loop (HiTL) optimal synchronization control problem for multiagent systems (MASs) under link-based denial-of-service (DoS) attacks is investigated. First, the HiTL framework enables the human operator to govern the MASs by transmitting commands to the leader. The link-based DoS attacks cause communication blockages between agents, resulting in topology switching. Under the switching communication topology, a fully distributed observer is proposed for each follower, which simultaneously integrates a prescribed finite-time function to estimate the leader's output within the PT. This observer is characterized by a bounded gain at the PT point and guarantees global practical PT convergence, while avoiding the use of global topology information. By combining the follower dynamics with the proposed observer, an augmented system is developed. Subsequently, the model-free Q-learning algorithm is used to learn the optimal synchronization policy directly from real system data. To reduce computational burden, the Q-learning algorithm is implemented using a single critic neural network (NN) structure, with the least-squares method applied to train the NN weights. The convergence of the Q-functions generated by the proposed Q-learning algorithm is proven. Finally, simulation results verify the effectiveness of the proposed control scheme.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.