Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents

arXiv - CS - Multiagent Systems Pub Date : 2024-08-06 DOI:arxiv-2408.03405

Lucia Gordon, Esther Rolf, Milind Tambe

{"title":"Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents","authors":"Lucia Gordon, Esther Rolf, Milind Tambe","doi":"arxiv-2408.03405","DOIUrl":null,"url":null,"abstract":"Stochastic multi-agent multi-armed bandits typically assume that the rewards\nfrom each arm follow a fixed distribution, regardless of which agent pulls the\narm. However, in many real-world settings, rewards can depend on the\nsensitivity of each agent to their environment. In medical screening, disease\ndetection rates can vary by test type; in preference matching, rewards can\ndepend on user preferences; and in environmental sensing, observation quality\ncan vary across sensors. Since past work does not specify how to allocate\nagents of heterogeneous but known sensitivity of these types in a stochastic\nbandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates\ninformation from diverse agents. In doing so, we address the joint challenges\nof (i) aggregating the rewards, which follow different distributions for each\nagent-arm pair, and (ii) coordinating the assignments of agents to arms.\nMin-Width facilitates efficient collaboration among heterogeneous agents,\nexploiting the known structure in the agents' reward functions to weight their\nrewards accordingly. We analyze the regret of Min-Width and conduct\npseudo-synthetic and fully synthetic experiments to study the performance of\ndifferent levels of information sharing. Our results confirm that the gains to\nmodeling agent heterogeneity tend to be greater when the sensitivities are more\nvaried across agents, while combining more information does not always improve\nperformance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Stochastic multi-agent multi-armed bandits typically assume that the rewards from each arm follow a fixed distribution, regardless of which agent pulls the arm. However, in many real-world settings, rewards can depend on the sensitivity of each agent to their environment. In medical screening, disease detection rates can vary by test type; in preference matching, rewards can depend on user preferences; and in environmental sensing, observation quality can vary across sensors. Since past work does not specify how to allocate agents of heterogeneous but known sensitivity of these types in a stochastic bandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates information from diverse agents. In doing so, we address the joint challenges of (i) aggregating the rewards, which follow different distributions for each agent-arm pair, and (ii) coordinating the assignments of agents to arms. Min-Width facilitates efficient collaboration among heterogeneous agents, exploiting the known structure in the agents' reward functions to weight their rewards accordingly. We analyze the regret of Min-Width and conduct pseudo-synthetic and fully synthetic experiments to study the performance of different levels of information sharing. Our results confirm that the gains to modeling agent heterogeneity tend to be greater when the sensitivities are more varied across agents, while combining more information does not always improve performance.

查看原文本刊更多论文

结合不同信息，协调行动：异构代理的随机匪徒算法

随机多代理多臂强盗通常假定，无论哪个代理拉动手臂，每个手臂的奖励都遵循固定的分布。然而，在现实世界的许多环境中，奖励可能取决于每个代理对环境的敏感度。在医疗筛查中，疾病检测率可能因检测类型而异；在偏好匹配中，奖励取决于用户的偏好；在环境感知中，不同传感器的观测质量可能不同。由于过去的工作没有明确说明如何在随机带位设置中分配这些类型的异构但已知灵敏度的代理，因此我们引入了一种 UCB 类型的算法 Min-Width，它可以聚合来自不同代理的信息。Min-Width 促进了异构代理之间的高效协作，利用代理奖励函数中的已知结构对其奖励进行相应加权。我们分析了 Min-Width 的遗憾，并进行了伪合成和全合成实验，以研究不同信息共享水平的性能。我们的结果证实，当各代理的敏感性差异较大时，模拟代理异质性的收益往往更大，而结合更多信息并不总能提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量