Recep Can Yavas, Yuqi Huang, Vincent Y. F. Tan, Jonathan Scarlett
{"title":"A General Framework for Clustering and Distribution Matching with Bandit Feedback","authors":"Recep Can Yavas, Yuqi Huang, Vincent Y. F. Tan, Jonathan Scarlett","doi":"arxiv-2409.05072","DOIUrl":null,"url":null,"abstract":"We develop a general framework for clustering and distribution matching\nproblems with bandit feedback. We consider a $K$-armed bandit model where some\nsubset of $K$ arms is partitioned into $M$ groups. Within each group, the\nrandom variable associated to each arm follows the same distribution on a\nfinite alphabet. At each time step, the decision maker pulls an arm and\nobserves its outcome from the random variable associated to that arm.\nSubsequent arm pulls depend on the history of arm pulls and their outcomes. The\ndecision maker has no knowledge of the distributions of the arms or the\nunderlying partitions. The task is to devise an online algorithm to learn the\nunderlying partition of arms with the least number of arm pulls on average and\nwith an error probability not exceeding a pre-determined value $\\delta$.\nSeveral existing problems fall under our general framework, including finding\n$M$ pairs of arms, odd arm identification, and $M$-ary clustering of $K$ arms\nbelong to our general framework. We derive a non-asymptotic lower bound on the\naverage number of arm pulls for any online algorithm with an error probability\nnot exceeding $\\delta$. Furthermore, we develop a computationally-efficient\nonline algorithm based on the Track-and-Stop method and Frank--Wolfe algorithm,\nand show that the average number of arm pulls of our algorithm asymptotically\nmatches that of the lower bound. Our refined analysis also uncovers a novel\nbound on the speed at which the average number of arm pulls of our algorithm\nconverges to the fundamental limit as $\\delta$ vanishes.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We develop a general framework for clustering and distribution matching
problems with bandit feedback. We consider a $K$-armed bandit model where some
subset of $K$ arms is partitioned into $M$ groups. Within each group, the
random variable associated to each arm follows the same distribution on a
finite alphabet. At each time step, the decision maker pulls an arm and
observes its outcome from the random variable associated to that arm.
Subsequent arm pulls depend on the history of arm pulls and their outcomes. The
decision maker has no knowledge of the distributions of the arms or the
underlying partitions. The task is to devise an online algorithm to learn the
underlying partition of arms with the least number of arm pulls on average and
with an error probability not exceeding a pre-determined value $\delta$.
Several existing problems fall under our general framework, including finding
$M$ pairs of arms, odd arm identification, and $M$-ary clustering of $K$ arms
belong to our general framework. We derive a non-asymptotic lower bound on the
average number of arm pulls for any online algorithm with an error probability
not exceeding $\delta$. Furthermore, we develop a computationally-efficient
online algorithm based on the Track-and-Stop method and Frank--Wolfe algorithm,
and show that the average number of arm pulls of our algorithm asymptotically
matches that of the lower bound. Our refined analysis also uncovers a novel
bound on the speed at which the average number of arm pulls of our algorithm
converges to the fundamental limit as $\delta$ vanishes.