{"title":"线性离散多智能体系统的多步q -学习最优一致性控制。","authors":"Jialin Xiao,Biao Luo,Xiaodong Xu,Chunhua Yang,Weihua Gui","doi":"10.1109/tcyb.2025.3575419","DOIUrl":null,"url":null,"abstract":"This article considers the optimal consensus control for the multiagent systems problem. By developing the multiagent multistep Q-learning (MaMsQL), the methodology achieves enhanced efficiency while addressing the issue of the complex interaction dynamics between agents, environmental uncertainty, thus ultimately meeting demand of balancing exploration and exploitation. First, associated with the performance index, the Q-function is established to prove that all optimal Q-functions form a Nash equilibrium outcome, thereby the consensus problem is converted to finding the optimal Q-functions. Then, the MaMsQL method is developed with theoretical proof of its convergence. Finally, the method is implemented through a specially designed Actor-Critic network. By virtue of the comparison with multiagent single step Q-learning, the effectiveness and superiority of this method are verified through simulation examples.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"20 1","pages":""},"PeriodicalIF":10.5000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multistep Q-Learning-Based Optimal Consensus Control of Linear Discrete-Time Multiagent Systems.\",\"authors\":\"Jialin Xiao,Biao Luo,Xiaodong Xu,Chunhua Yang,Weihua Gui\",\"doi\":\"10.1109/tcyb.2025.3575419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article considers the optimal consensus control for the multiagent systems problem. By developing the multiagent multistep Q-learning (MaMsQL), the methodology achieves enhanced efficiency while addressing the issue of the complex interaction dynamics between agents, environmental uncertainty, thus ultimately meeting demand of balancing exploration and exploitation. First, associated with the performance index, the Q-function is established to prove that all optimal Q-functions form a Nash equilibrium outcome, thereby the consensus problem is converted to finding the optimal Q-functions. Then, the MaMsQL method is developed with theoretical proof of its convergence. Finally, the method is implemented through a specially designed Actor-Critic network. By virtue of the comparison with multiagent single step Q-learning, the effectiveness and superiority of this method are verified through simulation examples.\",\"PeriodicalId\":13112,\"journal\":{\"name\":\"IEEE Transactions on Cybernetics\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":10.5000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cybernetics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tcyb.2025.3575419\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tcyb.2025.3575419","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Multistep Q-Learning-Based Optimal Consensus Control of Linear Discrete-Time Multiagent Systems.
This article considers the optimal consensus control for the multiagent systems problem. By developing the multiagent multistep Q-learning (MaMsQL), the methodology achieves enhanced efficiency while addressing the issue of the complex interaction dynamics between agents, environmental uncertainty, thus ultimately meeting demand of balancing exploration and exploitation. First, associated with the performance index, the Q-function is established to prove that all optimal Q-functions form a Nash equilibrium outcome, thereby the consensus problem is converted to finding the optimal Q-functions. Then, the MaMsQL method is developed with theoretical proof of its convergence. Finally, the method is implemented through a specially designed Actor-Critic network. By virtue of the comparison with multiagent single step Q-learning, the effectiveness and superiority of this method are verified through simulation examples.
期刊介绍:
The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.