Dynamic Regret of Quantized Distributed Online Bandit Optimization in Zero-Sum Games.

IF 10.5 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Cybernetics Pub Date : 2025-09-16 DOI:10.1109/tcyb.2025.3604774

Lan Liao,Daniel W C Ho,Deming Yuan,Zhan Yu,Baoyong Zhang,Shengyuan Xu

{"title":"Dynamic Regret of Quantized Distributed Online Bandit Optimization in Zero-Sum Games.","authors":"Lan Liao,Daniel W C Ho,Deming Yuan,Zhan Yu,Baoyong Zhang,Shengyuan Xu","doi":"10.1109/tcyb.2025.3604774","DOIUrl":null,"url":null,"abstract":"This article investigates the distributed online optimization problem in a zero-sum game between two distinct time-varying multiagent networks. At each iteration, the agents not only communicate with their neighbors but also gather information about agents in the opposing network through a time-varying network, assigning weights accordingly. Moreover, we consider quantized communication and bandit feedback mechanisms, with agents transmitting quantized information and adopting one-point estimators. At each iteration, agents make and submit decisions and then receive the cost function values near their decision points rather than the full cost function information. To guarantee the payoff of each network, we design an algorithm named quantized distributed online bandit optimization in two-network (QDOBO-TN). We use dynamic Nash equilibrium regret to measure the positive payoff discrepancy between the decision sequence produced by Algorithm QDOBO-TN and the Nash equilibrium sequence. Furthermore, we propose a multiepoch version of Algorithm QDOBO-TN. The regret bounds for both algorithms are sublinear with respect to the iteration count T. Finally, we conduct a series of simulation experiments that further validate the effectiveness of the algorithms.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"30 1","pages":""},"PeriodicalIF":10.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tcyb.2025.3604774","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This article investigates the distributed online optimization problem in a zero-sum game between two distinct time-varying multiagent networks. At each iteration, the agents not only communicate with their neighbors but also gather information about agents in the opposing network through a time-varying network, assigning weights accordingly. Moreover, we consider quantized communication and bandit feedback mechanisms, with agents transmitting quantized information and adopting one-point estimators. At each iteration, agents make and submit decisions and then receive the cost function values near their decision points rather than the full cost function information. To guarantee the payoff of each network, we design an algorithm named quantized distributed online bandit optimization in two-network (QDOBO-TN). We use dynamic Nash equilibrium regret to measure the positive payoff discrepancy between the decision sequence produced by Algorithm QDOBO-TN and the Nash equilibrium sequence. Furthermore, we propose a multiepoch version of Algorithm QDOBO-TN. The regret bounds for both algorithms are sublinear with respect to the iteration count T. Finally, we conduct a series of simulation experiments that further validate the effectiveness of the algorithms.

查看原文本刊更多论文

零和博弈中量化分布式在线盗匪优化的动态后悔

本文研究了两个不同时变多智能体网络零和博弈中的分布式在线优化问题。在每次迭代中，智能体不仅与相邻智能体通信，还通过时变网络收集对方网络中智能体的信息，并据此分配权重。此外，我们考虑了量化通信和强盗反馈机制，其中代理传递量化信息并采用一点估计器。在每次迭代中，代理做出并提交决策，然后在其决策点附近接收成本函数值，而不是完整的成本函数信息。为了保证每个网络的收益，我们设计了一种双网络量化分布式在线盗匪优化算法（QDOBO-TN）。我们使用动态纳什均衡后悔来衡量由QDOBO-TN算法生成的决策序列与纳什均衡序列之间的正收益差异。在此基础上，提出了QDOBO-TN算法的多历元版本。两种算法的遗憾界相对于迭代次数t都是次线性的。最后，我们进行了一系列仿真实验，进一步验证了算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

25.40

自引率

11.00%

发文量

1869

期刊介绍： The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.