{"title":"Combining Diverse Information for Coordinated Action: Stochastic Bandit Algorithms for Heterogeneous Agents","authors":"Lucia Gordon, Esther Rolf, Milind Tambe","doi":"arxiv-2408.03405","DOIUrl":null,"url":null,"abstract":"Stochastic multi-agent multi-armed bandits typically assume that the rewards\nfrom each arm follow a fixed distribution, regardless of which agent pulls the\narm. However, in many real-world settings, rewards can depend on the\nsensitivity of each agent to their environment. In medical screening, disease\ndetection rates can vary by test type; in preference matching, rewards can\ndepend on user preferences; and in environmental sensing, observation quality\ncan vary across sensors. Since past work does not specify how to allocate\nagents of heterogeneous but known sensitivity of these types in a stochastic\nbandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates\ninformation from diverse agents. In doing so, we address the joint challenges\nof (i) aggregating the rewards, which follow different distributions for each\nagent-arm pair, and (ii) coordinating the assignments of agents to arms.\nMin-Width facilitates efficient collaboration among heterogeneous agents,\nexploiting the known structure in the agents' reward functions to weight their\nrewards accordingly. We analyze the regret of Min-Width and conduct\npseudo-synthetic and fully synthetic experiments to study the performance of\ndifferent levels of information sharing. Our results confirm that the gains to\nmodeling agent heterogeneity tend to be greater when the sensitivities are more\nvaried across agents, while combining more information does not always improve\nperformance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.03405","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Stochastic multi-agent multi-armed bandits typically assume that the rewards
from each arm follow a fixed distribution, regardless of which agent pulls the
arm. However, in many real-world settings, rewards can depend on the
sensitivity of each agent to their environment. In medical screening, disease
detection rates can vary by test type; in preference matching, rewards can
depend on user preferences; and in environmental sensing, observation quality
can vary across sensors. Since past work does not specify how to allocate
agents of heterogeneous but known sensitivity of these types in a stochastic
bandit setting, we introduce a UCB-style algorithm, Min-Width, which aggregates
information from diverse agents. In doing so, we address the joint challenges
of (i) aggregating the rewards, which follow different distributions for each
agent-arm pair, and (ii) coordinating the assignments of agents to arms.
Min-Width facilitates efficient collaboration among heterogeneous agents,
exploiting the known structure in the agents' reward functions to weight their
rewards accordingly. We analyze the regret of Min-Width and conduct
pseudo-synthetic and fully synthetic experiments to study the performance of
different levels of information sharing. Our results confirm that the gains to
modeling agent heterogeneity tend to be greater when the sensitivities are more
varied across agents, while combining more information does not always improve
performance.