带 Bandit 反馈的聚类和分布匹配通用框架

arXiv - MATH - Information Theory Pub Date : 2024-09-08 DOI:arxiv-2409.05072

Recep Can Yavas, Yuqi Huang, Vincent Y. F. Tan, Jonathan Scarlett

{"title":"带 Bandit 反馈的聚类和分布匹配通用框架","authors":"Recep Can Yavas, Yuqi Huang, Vincent Y. F. Tan, Jonathan Scarlett","doi":"arxiv-2409.05072","DOIUrl":null,"url":null,"abstract":"We develop a general framework for clustering and distribution matching\nproblems with bandit feedback. We consider a $K$-armed bandit model where some\nsubset of $K$ arms is partitioned into $M$ groups. Within each group, the\nrandom variable associated to each arm follows the same distribution on a\nfinite alphabet. At each time step, the decision maker pulls an arm and\nobserves its outcome from the random variable associated to that arm.\nSubsequent arm pulls depend on the history of arm pulls and their outcomes. The\ndecision maker has no knowledge of the distributions of the arms or the\nunderlying partitions. The task is to devise an online algorithm to learn the\nunderlying partition of arms with the least number of arm pulls on average and\nwith an error probability not exceeding a pre-determined value $\\delta$.\nSeveral existing problems fall under our general framework, including finding\n$M$ pairs of arms, odd arm identification, and $M$-ary clustering of $K$ arms\nbelong to our general framework. We derive a non-asymptotic lower bound on the\naverage number of arm pulls for any online algorithm with an error probability\nnot exceeding $\\delta$. Furthermore, we develop a computationally-efficient\nonline algorithm based on the Track-and-Stop method and Frank--Wolfe algorithm,\nand show that the average number of arm pulls of our algorithm asymptotically\nmatches that of the lower bound. Our refined analysis also uncovers a novel\nbound on the speed at which the average number of arm pulls of our algorithm\nconverges to the fundamental limit as $\\delta$ vanishes.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A General Framework for Clustering and Distribution Matching with Bandit Feedback\",\"authors\":\"Recep Can Yavas, Yuqi Huang, Vincent Y. F. Tan, Jonathan Scarlett\",\"doi\":\"arxiv-2409.05072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We develop a general framework for clustering and distribution matching\\nproblems with bandit feedback. We consider a $K$-armed bandit model where some\\nsubset of $K$ arms is partitioned into $M$ groups. Within each group, the\\nrandom variable associated to each arm follows the same distribution on a\\nfinite alphabet. At each time step, the decision maker pulls an arm and\\nobserves its outcome from the random variable associated to that arm.\\nSubsequent arm pulls depend on the history of arm pulls and their outcomes. The\\ndecision maker has no knowledge of the distributions of the arms or the\\nunderlying partitions. The task is to devise an online algorithm to learn the\\nunderlying partition of arms with the least number of arm pulls on average and\\nwith an error probability not exceeding a pre-determined value $\\\\delta$.\\nSeveral existing problems fall under our general framework, including finding\\n$M$ pairs of arms, odd arm identification, and $M$-ary clustering of $K$ arms\\nbelong to our general framework. We derive a non-asymptotic lower bound on the\\naverage number of arm pulls for any online algorithm with an error probability\\nnot exceeding $\\\\delta$. Furthermore, we develop a computationally-efficient\\nonline algorithm based on the Track-and-Stop method and Frank--Wolfe algorithm,\\nand show that the average number of arm pulls of our algorithm asymptotically\\nmatches that of the lower bound. Our refined analysis also uncovers a novel\\nbound on the speed at which the average number of arm pulls of our algorithm\\nconverges to the fundamental limit as $\\\\delta$ vanishes.\",\"PeriodicalId\":501082,\"journal\":{\"name\":\"arXiv - MATH - Information Theory\",\"volume\":\"42 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05072\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05072","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们为带有强盗反馈的聚类和分布匹配问题建立了一个通用框架。我们考虑了一个 $K$ 持械强盗模型，其中 $K$ 持械的某个子集被划分为 $M$ 组。在每个组内，与每个手臂相关的随机变量在无限字母表上遵循相同的分布。在每个时间步长，决策者拉动一只手臂，并从与该手臂相关的随机变量中观察其结果。决策者对臂的分布或基本分区一无所知。我们的任务是设计一种在线算法，以平均最少的手臂拉动次数和不超过预定值 $\delta$ 的错误概率来学习手臂的底层分区。现有的几个问题都属于我们的一般框架，包括寻找 $M$ 对手臂、奇数手臂识别和 $K$ 手臂的 $M$ary 聚类。我们推导出了错误概率不超过 $\delta$ 的任何在线算法的平均拉臂次数的非渐进下限。此外，我们还开发了一种基于跟踪-停止法和弗兰克-沃尔夫算法的计算效率高的在线算法，并证明我们算法的平均拉臂次数在渐近上与下限相匹配。我们的细化分析还发现了一个新的边界，即当 $\delta$ 消失时，我们算法的平均拉臂次数收敛到基本极限的速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A General Framework for Clustering and Distribution Matching with Bandit Feedback

We develop a general framework for clustering and distribution matching problems with bandit feedback. We consider a $K$-armed bandit model where some subset of $K$ arms is partitioned into $M$ groups. Within each group, the random variable associated to each arm follows the same distribution on a finite alphabet. At each time step, the decision maker pulls an arm and observes its outcome from the random variable associated to that arm. Subsequent arm pulls depend on the history of arm pulls and their outcomes. The decision maker has no knowledge of the distributions of the arms or the underlying partitions. The task is to devise an online algorithm to learn the underlying partition of arms with the least number of arm pulls on average and with an error probability not exceeding a pre-determined value $\delta$. Several existing problems fall under our general framework, including finding $M$ pairs of arms, odd arm identification, and $M$-ary clustering of $K$ arms belong to our general framework. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding $\delta$. Furthermore, we develop a computationally-efficient online algorithm based on the Track-and-Stop method and Frank--Wolfe algorithm, and show that the average number of arm pulls of our algorithm asymptotically matches that of the lower bound. Our refined analysis also uncovers a novel bound on the speed at which the average number of arm pulls of our algorithm converges to the fundamental limit as $\delta$ vanishes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - MATH - Information Theory

自引率

0.00%

发文量