利用高概率交换-保留上限值进行网络优化的博弈论强盗游戏

IF 3 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Zhiming Huang;Jianping Pan
{"title":"利用高概率交换-保留上限值进行网络优化的博弈论强盗游戏","authors":"Zhiming Huang;Jianping Pan","doi":"10.1109/TNET.2024.3444593","DOIUrl":null,"url":null,"abstract":"In this paper, we study a multi-agent bandit problem in an unknown general-sum game repeated for a number of rounds (i.e., learning in a black-box game with bandit feedback), where a set of agents have no information about the underlying game structure and cannot observe each other’s actions and rewards. In each round, each agent needs to play an arm (i.e., action) from a (possibly different) arm set (i.e., action set), and \n<italic>only</i>\n receives the reward of the \n<italic>played</i>\n arm that is affected by other agents’ actions. The objective of each agent is to minimize her own cumulative swap regret, where the swap regret is a generic performance measure for online learning algorithms. Many network optimization problems can be cast with the framework of this multi-agent bandit problem, such as wireless medium access control and end-to-end congestion control. We propose an online-mirror-descent-based algorithm and provide near-optimal high-probability swap-regret upper bounds based on refined martingale analyses, which can further bound the expected swap regret instead of the pseudo-regret studied in the literature. Moreover, the high-probability bounds guarantee that correlated equilibria can be achieved in a polynomial number of rounds if the algorithms are played by all agents. To assess the performance of the studied algorithm, we conducted numerical experiments in the context of wireless medium access control, and we performed emulation experiments by implementing the studied algorithms through the Linux Kernel for the end-to-end congestion control.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"4855-4870"},"PeriodicalIF":3.0000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Game-Theoretic Bandits for Network Optimization With High-Probability Swap-Regret Upper Bounds\",\"authors\":\"Zhiming Huang;Jianping Pan\",\"doi\":\"10.1109/TNET.2024.3444593\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we study a multi-agent bandit problem in an unknown general-sum game repeated for a number of rounds (i.e., learning in a black-box game with bandit feedback), where a set of agents have no information about the underlying game structure and cannot observe each other’s actions and rewards. In each round, each agent needs to play an arm (i.e., action) from a (possibly different) arm set (i.e., action set), and \\n<italic>only</i>\\n receives the reward of the \\n<italic>played</i>\\n arm that is affected by other agents’ actions. The objective of each agent is to minimize her own cumulative swap regret, where the swap regret is a generic performance measure for online learning algorithms. Many network optimization problems can be cast with the framework of this multi-agent bandit problem, such as wireless medium access control and end-to-end congestion control. We propose an online-mirror-descent-based algorithm and provide near-optimal high-probability swap-regret upper bounds based on refined martingale analyses, which can further bound the expected swap regret instead of the pseudo-regret studied in the literature. Moreover, the high-probability bounds guarantee that correlated equilibria can be achieved in a polynomial number of rounds if the algorithms are played by all agents. To assess the performance of the studied algorithm, we conducted numerical experiments in the context of wireless medium access control, and we performed emulation experiments by implementing the studied algorithms through the Linux Kernel for the end-to-end congestion control.\",\"PeriodicalId\":13443,\"journal\":{\"name\":\"IEEE/ACM Transactions on Networking\",\"volume\":\"32 6\",\"pages\":\"4855-4870\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-08-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Networking\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10645817/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10645817/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们研究了一个重复数轮的未知一般和博弈中的多智能体盗匪问题(即在具有盗匪反馈的黑盒博弈中学习),其中一组智能体没有关于潜在博弈结构的信息,并且无法观察彼此的行为和奖励。在每一轮中,每个智能体都需要从一个(可能不同的)手臂集(即行动集)中使用一只手臂(即行动),并且只接受受其他智能体行动影响的已使用手臂的奖励。每个智能体的目标是最小化其自身的累积交换遗憾,其中交换遗憾是在线学习算法的通用性能度量。许多网络优化问题可以用这个多智能体强盗问题的框架来解决,如无线介质访问控制和端到端拥塞控制。我们提出了一种基于在线镜像下降的算法,并基于精细鞅分析提供了接近最优的高概率交换后悔上界,该算法可以进一步约束期望交换后悔,而不是文献中研究的伪后悔。此外,高概率界保证了当算法由所有主体参与时,相关均衡可以在多项式轮数内实现。为了评估所研究算法的性能,我们在无线介质访问控制的背景下进行了数值实验,并通过Linux内核实现所研究的算法进行了端到端拥塞控制的仿真实验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Game-Theoretic Bandits for Network Optimization With High-Probability Swap-Regret Upper Bounds
In this paper, we study a multi-agent bandit problem in an unknown general-sum game repeated for a number of rounds (i.e., learning in a black-box game with bandit feedback), where a set of agents have no information about the underlying game structure and cannot observe each other’s actions and rewards. In each round, each agent needs to play an arm (i.e., action) from a (possibly different) arm set (i.e., action set), and only receives the reward of the played arm that is affected by other agents’ actions. The objective of each agent is to minimize her own cumulative swap regret, where the swap regret is a generic performance measure for online learning algorithms. Many network optimization problems can be cast with the framework of this multi-agent bandit problem, such as wireless medium access control and end-to-end congestion control. We propose an online-mirror-descent-based algorithm and provide near-optimal high-probability swap-regret upper bounds based on refined martingale analyses, which can further bound the expected swap regret instead of the pseudo-regret studied in the literature. Moreover, the high-probability bounds guarantee that correlated equilibria can be achieved in a polynomial number of rounds if the algorithms are played by all agents. To assess the performance of the studied algorithm, we conducted numerical experiments in the context of wireless medium access control, and we performed emulation experiments by implementing the studied algorithms through the Linux Kernel for the end-to-end congestion control.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE/ACM Transactions on Networking
IEEE/ACM Transactions on Networking 工程技术-电信学
CiteScore
8.20
自引率
5.40%
发文量
246
审稿时长
4-8 weeks
期刊介绍: The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信