Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents

Lucia Gordon, Esther Rolf, Milind Tambe
{"title":"Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents","authors":"Lucia Gordon, Esther Rolf, Milind Tambe","doi":"arxiv-2408.03405","DOIUrl":null,"url":null,"abstract":"Stochastic multi-agent multi-armed bandits typically assume that the rewards\nfrom each arm follow a fixed distribution, regardless of which agent pulls the\narm. However, in many real-world settings, rewards can depend on the\nsensitivity of each agent to their environment. In medical screening, disease\ndetection rates can vary by test type; in preference matching, rewards can\ndepend on user preferences; and in environmental sensing, observation quality\ncan vary across sensors. Since past work does not specify how to allocate\nagents of heterogeneous but known sensitivity of these types in a stochastic\nbandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates\ninformation from diverse agents. In doing so, we address the joint challenges\nof (i) aggregating the rewards, which follow different distributions for each\nagent-arm pair, and (ii) coordinating the assignments of agents to arms.\nMin-Width facilitates efficient collaboration among heterogeneous agents,\nexploiting the known structure in the agents' reward functions to weight their\nrewards accordingly. We analyze the regret of Min-Width and conduct\npseudo-synthetic and fully synthetic experiments to study the performance of\ndifferent levels of information sharing. Our results confirm that the gains to\nmodeling agent heterogeneity tend to be greater when the sensitivities are more\nvaried across agents, while combining more information does not always improve\nperformance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Stochastic multi-agent multi-armed bandits typically assume that the rewards from each arm follow a fixed distribution, regardless of which agent pulls the arm. However, in many real-world settings, rewards can depend on the sensitivity of each agent to their environment. In medical screening, disease detection rates can vary by test type; in preference matching, rewards can depend on user preferences; and in environmental sensing, observation quality can vary across sensors. Since past work does not specify how to allocate agents of heterogeneous but known sensitivity of these types in a stochastic bandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates information from diverse agents. In doing so, we address the joint challenges of (i) aggregating the rewards, which follow different distributions for each agent-arm pair, and (ii) coordinating the assignments of agents to arms. Min-Width facilitates efficient collaboration among heterogeneous agents, exploiting the known structure in the agents' reward functions to weight their rewards accordingly. We analyze the regret of Min-Width and conduct pseudo-synthetic and fully synthetic experiments to study the performance of different levels of information sharing. Our results confirm that the gains to modeling agent heterogeneity tend to be greater when the sensitivities are more varied across agents, while combining more information does not always improve performance.
结合不同信息,协调行动:异构代理的随机匪徒算法
随机多代理多臂强盗通常假定,无论哪个代理拉动手臂,每个手臂的奖励都遵循固定的分布。然而,在现实世界的许多环境中,奖励可能取决于每个代理对环境的敏感度。在医疗筛查中,疾病检测率可能因检测类型而异;在偏好匹配中,奖励取决于用户的偏好;在环境感知中,不同传感器的观测质量可能不同。由于过去的工作没有明确说明如何在随机带位设置中分配这些类型的异构但已知灵敏度的代理,因此我们引入了一种 UCB 类型的算法 Min-Width,它可以聚合来自不同代理的信息。Min-Width 促进了异构代理之间的高效协作,利用代理奖励函数中的已知结构对其奖励进行相应加权。我们分析了 Min-Width 的遗憾,并进行了伪合成和全合成实验,以研究不同信息共享水平的性能。我们的结果证实,当各代理的敏感性差异较大时,模拟代理异质性的收益往往更大,而结合更多信息并不总能提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信