分布式随机强盗中的信息共享

Swapna Buccapatnam, Jian Tan, Li Zhang
{"title":"分布式随机强盗中的信息共享","authors":"Swapna Buccapatnam, Jian Tan, Li Zhang","doi":"10.1109/INFOCOM.2015.7218651","DOIUrl":null,"url":null,"abstract":"Information sharing is an important issue for stochastic bandit problems in a distributed setting. Consider N players dealing with the same multi-armed bandit problem. All players receive requests simultaneously and must choose one of M actions for each request. Sharing information among these N players can decrease the regret for each of them but also incurs cooperation and communication overhead. In this setting, we study how cooperation and communication can impact the system performance measured by regret and communication cost. For both scenarios, we establish a uniform lower bound to the regret for the entire system as a function of time and network size. Concerning cooperation, we study the problem from a game-theoretic perspective. When each player's actions and payoffs are immediately visible to all others, we identify strategies for all players under which co-operative exploration is ensured. Regarding the communication cost, we consider incomplete information sharing such that a player's payoffs and actions are not entirely available to others. The players communicate observations to each other to reduce their regret, however with a cost. We show that a logarithmic communication cost is necessary to achieve the optimal regret. For Bernoulli arrivals, we specify a policy that achieves the optimal regret with a logarithmic communication cost. Our work opens a novel direction towards understanding information sharing for active learning in a distributed environment.","PeriodicalId":342583,"journal":{"name":"2015 IEEE Conference on Computer Communications (INFOCOM)","volume":"263 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":"{\"title\":\"Information sharing in distributed stochastic bandits\",\"authors\":\"Swapna Buccapatnam, Jian Tan, Li Zhang\",\"doi\":\"10.1109/INFOCOM.2015.7218651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information sharing is an important issue for stochastic bandit problems in a distributed setting. Consider N players dealing with the same multi-armed bandit problem. All players receive requests simultaneously and must choose one of M actions for each request. Sharing information among these N players can decrease the regret for each of them but also incurs cooperation and communication overhead. In this setting, we study how cooperation and communication can impact the system performance measured by regret and communication cost. For both scenarios, we establish a uniform lower bound to the regret for the entire system as a function of time and network size. Concerning cooperation, we study the problem from a game-theoretic perspective. When each player's actions and payoffs are immediately visible to all others, we identify strategies for all players under which co-operative exploration is ensured. Regarding the communication cost, we consider incomplete information sharing such that a player's payoffs and actions are not entirely available to others. The players communicate observations to each other to reduce their regret, however with a cost. We show that a logarithmic communication cost is necessary to achieve the optimal regret. For Bernoulli arrivals, we specify a policy that achieves the optimal regret with a logarithmic communication cost. Our work opens a novel direction towards understanding information sharing for active learning in a distributed environment.\",\"PeriodicalId\":342583,\"journal\":{\"name\":\"2015 IEEE Conference on Computer Communications (INFOCOM)\",\"volume\":\"263 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"38\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE Conference on Computer Communications (INFOCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM.2015.7218651\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Conference on Computer Communications (INFOCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM.2015.7218651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 38

摘要

信息共享是分布式环境下随机盗匪问题的一个重要问题。假设N个玩家面对相同的多手强盗问题。所有玩家同时收到请求,并且必须为每个请求选择M种行动中的一种。在这N个玩家之间分享信息可以减少他们每个人的遗憾,但也会增加合作和沟通的开销。在此背景下,我们研究合作与沟通如何影响以后悔和沟通成本衡量的系统绩效。对于这两种情况,我们为整个系统建立了一个统一的遗憾下界,作为时间和网络大小的函数。关于合作问题,我们从博弈论的角度来研究。当每个玩家的行动和收益都能立即被所有人看到时,我们就为所有玩家确定了确保合作探索的策略。关于沟通成本,我们考虑的是不完全信息共享,即玩家的收益和行动并不完全为他人所知。玩家相互交流观察结果以减少后悔,但这是有代价的。我们证明了对数通信代价是实现最优遗憾的必要条件。对于伯努利到达,我们指定了一个以对数通信成本实现最优后悔的策略。我们的工作为理解分布式环境中主动学习的信息共享开辟了一个新的方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Information sharing in distributed stochastic bandits
Information sharing is an important issue for stochastic bandit problems in a distributed setting. Consider N players dealing with the same multi-armed bandit problem. All players receive requests simultaneously and must choose one of M actions for each request. Sharing information among these N players can decrease the regret for each of them but also incurs cooperation and communication overhead. In this setting, we study how cooperation and communication can impact the system performance measured by regret and communication cost. For both scenarios, we establish a uniform lower bound to the regret for the entire system as a function of time and network size. Concerning cooperation, we study the problem from a game-theoretic perspective. When each player's actions and payoffs are immediately visible to all others, we identify strategies for all players under which co-operative exploration is ensured. Regarding the communication cost, we consider incomplete information sharing such that a player's payoffs and actions are not entirely available to others. The players communicate observations to each other to reduce their regret, however with a cost. We show that a logarithmic communication cost is necessary to achieve the optimal regret. For Bernoulli arrivals, we specify a policy that achieves the optimal regret with a logarithmic communication cost. Our work opens a novel direction towards understanding information sharing for active learning in a distributed environment.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信