{"title":"On Restless Linear Bandits","authors":"Azadeh Khaleghi","doi":"10.1109/TIT.2025.3533299","DOIUrl":null,"url":null,"abstract":"A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown <inline-formula> <tex-math>$\\mathbb {R}^{d}$ </tex-math></inline-formula>-valued stationary <inline-formula> <tex-math>$\\varphi $ </tex-math></inline-formula>-mixing sequence of parameters <inline-formula> <tex-math>$(\\theta _{t}, \\; t \\in \\mathbb {N})$ </tex-math></inline-formula> which gives rise to payoffs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the <inline-formula> <tex-math>$\\varphi $ </tex-math></inline-formula>-dependence between consecutive <inline-formula> <tex-math>$\\theta _{t}$ </tex-math></inline-formula>. An optimistic algorithm, called LinMix-UCB, is proposed for the case where <inline-formula> <tex-math>$\\theta _{t}$ </tex-math></inline-formula> has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of <inline-formula> <tex-math>$\\mathcal {O}\\left ({{\\sqrt {d n\\mathop {\\mathrm {polylog}} (n) }}}\\right)$ </tex-math></inline-formula> with respect to an oracle that always plays a multiple of <inline-formula> <tex-math>$\\mathbb {E}\\;\\theta _{t}$ </tex-math></inline-formula>. The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee’s coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of <inline-formula> <tex-math>$\\mathbb {E}\\;\\theta _{t}$ </tex-math></inline-formula>.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 4","pages":"2982-2990"},"PeriodicalIF":2.2000,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10851303/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown $\mathbb {R}^{d}$ -valued stationary $\varphi $ -mixing sequence of parameters $(\theta _{t}, \; t \in \mathbb {N})$ which gives rise to payoffs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the $\varphi $ -dependence between consecutive $\theta _{t}$ . An optimistic algorithm, called LinMix-UCB, is proposed for the case where $\theta _{t}$ has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of $\mathcal {O}\left ({{\sqrt {d n\mathop {\mathrm {polylog}} (n) }}}\right)$ with respect to an oracle that always plays a multiple of $\mathbb {E}\;\theta _{t}$ . The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee’s coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of $\mathbb {E}\;\theta _{t}$ .
期刊介绍:
The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.