用于合作资源分配的联合在线无休止强盗框架

IF 7.7 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Jingwen Tong;Xinran Li;Liqun Fu;Jun Zhang;Khaled B. Letaief
{"title":"用于合作资源分配的联合在线无休止强盗框架","authors":"Jingwen Tong;Xinran Li;Liqun Fu;Jun Zhang;Khaled B. Letaief","doi":"10.1109/TMC.2024.3453250","DOIUrl":null,"url":null,"abstract":"Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we fill this gap by investigating a cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of \n<inline-formula><tex-math>$\\mathcal {O}(\\sqrt{T\\log (T)})$</tex-math></inline-formula>\n and better performance compared with baselines. More importantly, its sample complexity reduces sublinearly with the number of agents.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"23 12","pages":"15274-15288"},"PeriodicalIF":7.7000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Federated Online Restless Bandit Framework for Cooperative Resource Allocation\",\"authors\":\"Jingwen Tong;Xinran Li;Liqun Fu;Jun Zhang;Khaled B. Letaief\",\"doi\":\"10.1109/TMC.2024.3453250\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we fill this gap by investigating a cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of \\n<inline-formula><tex-math>$\\\\mathcal {O}(\\\\sqrt{T\\\\log (T)})$</tex-math></inline-formula>\\n and better performance compared with baselines. More importantly, its sample complexity reduces sublinearly with the number of agents.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"23 12\",\"pages\":\"15274-15288\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10663957/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663957/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

无休多臂匪帮(RMABs)已被广泛用于解决马尔可夫奖赏过程(MRPs)的资源分配问题。现有研究通常假定 MRP 的动态是已知的,这使得 RMAB 问题可以从优化角度求解。然而,对于系统动态未知的 RMAB,基于学习的高效解决方案仍是一个未决问题。本文通过研究具有未知系统动态的 MRP 的合作资源分配问题,填补了这一空白。这个问题可以建模为多代理在线 RMAB 问题,其中多个代理协作学习系统动态,同时最大化其累积奖励。我们设计了一个联合在线 RMAB 框架,通过采用联合学习范式来减轻通信开销和数据隐私问题。在此框架基础上,我们提出了一种联合汤普森采样启用惠特尔指数(FedTSWI)算法来解决多代理在线 RMAB 问题。FedTSWI 算法具有很高的通信和计算效率,并能保证隐私。此外,我们还推导出了 FedTSWI 算法的遗憾上限。最后,我们证明了所提算法在多用户多通道在线访问情况下的有效性。数值结果表明,与基线算法相比,所提出的算法达到了$\mathcal {O}(\sqrt{T\log (T)})$的快速收敛率和更好的性能。更重要的是,它的采样复杂度随着代理数量的增加呈亚线性下降。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation
Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we fill this gap by investigating a cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of $\mathcal {O}(\sqrt{T\log (T)})$ and better performance compared with baselines. More importantly, its sample complexity reduces sublinearly with the number of agents.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Mobile Computing
IEEE Transactions on Mobile Computing 工程技术-电信学
CiteScore
12.90
自引率
2.50%
发文量
403
审稿时长
6.6 months
期刊介绍: IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信