Dynamic Regret of Quantized Distributed Online Bandit Optimization in Zero-Sum Games.

IF 10.5 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Lan Liao,Daniel W C Ho,Deming Yuan,Zhan Yu,Baoyong Zhang,Shengyuan Xu
{"title":"Dynamic Regret of Quantized Distributed Online Bandit Optimization in Zero-Sum Games.","authors":"Lan Liao,Daniel W C Ho,Deming Yuan,Zhan Yu,Baoyong Zhang,Shengyuan Xu","doi":"10.1109/tcyb.2025.3604774","DOIUrl":null,"url":null,"abstract":"This article investigates the distributed online optimization problem in a zero-sum game between two distinct time-varying multiagent networks. At each iteration, the agents not only communicate with their neighbors but also gather information about agents in the opposing network through a time-varying network, assigning weights accordingly. Moreover, we consider quantized communication and bandit feedback mechanisms, with agents transmitting quantized information and adopting one-point estimators. At each iteration, agents make and submit decisions and then receive the cost function values near their decision points rather than the full cost function information. To guarantee the payoff of each network, we design an algorithm named quantized distributed online bandit optimization in two-network (QDOBO-TN). We use dynamic Nash equilibrium regret to measure the positive payoff discrepancy between the decision sequence produced by Algorithm QDOBO-TN and the Nash equilibrium sequence. Furthermore, we propose a multiepoch version of Algorithm QDOBO-TN. The regret bounds for both algorithms are sublinear with respect to the iteration count T. Finally, we conduct a series of simulation experiments that further validate the effectiveness of the algorithms.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"30 1","pages":""},"PeriodicalIF":10.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tcyb.2025.3604774","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This article investigates the distributed online optimization problem in a zero-sum game between two distinct time-varying multiagent networks. At each iteration, the agents not only communicate with their neighbors but also gather information about agents in the opposing network through a time-varying network, assigning weights accordingly. Moreover, we consider quantized communication and bandit feedback mechanisms, with agents transmitting quantized information and adopting one-point estimators. At each iteration, agents make and submit decisions and then receive the cost function values near their decision points rather than the full cost function information. To guarantee the payoff of each network, we design an algorithm named quantized distributed online bandit optimization in two-network (QDOBO-TN). We use dynamic Nash equilibrium regret to measure the positive payoff discrepancy between the decision sequence produced by Algorithm QDOBO-TN and the Nash equilibrium sequence. Furthermore, we propose a multiepoch version of Algorithm QDOBO-TN. The regret bounds for both algorithms are sublinear with respect to the iteration count T. Finally, we conduct a series of simulation experiments that further validate the effectiveness of the algorithms.
零和博弈中量化分布式在线盗匪优化的动态后悔
本文研究了两个不同时变多智能体网络零和博弈中的分布式在线优化问题。在每次迭代中,智能体不仅与相邻智能体通信,还通过时变网络收集对方网络中智能体的信息,并据此分配权重。此外,我们考虑了量化通信和强盗反馈机制,其中代理传递量化信息并采用一点估计器。在每次迭代中,代理做出并提交决策,然后在其决策点附近接收成本函数值,而不是完整的成本函数信息。为了保证每个网络的收益,我们设计了一种双网络量化分布式在线盗匪优化算法(QDOBO-TN)。我们使用动态纳什均衡后悔来衡量由QDOBO-TN算法生成的决策序列与纳什均衡序列之间的正收益差异。在此基础上,提出了QDOBO-TN算法的多历元版本。两种算法的遗憾界相对于迭代次数t都是次线性的。最后,我们进行了一系列仿真实验,进一步验证了算法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Cybernetics
IEEE Transactions on Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS
CiteScore
25.40
自引率
11.00%
发文量
1869
期刊介绍: The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信