认知无线网络中学习多用户信道分配:一种组合多臂强盗公式

2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN) Pub Date : 2010-04-01 DOI:10.1109/DYSPAN.2010.5457857

Yi Gai, B. Krishnamachari, Rahul Jain

{"title":"认知无线网络中学习多用户信道分配:一种组合多臂强盗公式","authors":"Yi Gai, B. Krishnamachari, Rahul Jain","doi":"10.1109/DYSPAN.2010.5457857","DOIUrl":null,"url":null,"abstract":"We consider the following fundamental problem in the context of channelized dynamic spectrum access. There are $M$ secondary users and $N \\ge M$ orthogonal channels. Each secondary user requires a single channel for operation that does not conflict with the channels assigned to the other users. Due to geographic dispersion, each secondary user can potentially see different primary user occupancy behavior on each channel. Time is divided into discrete decision rounds. The throughput obtainable from spectrum opportunities on each user-channel combination over a decision period is modeled as an arbitrarily-distributed non-negative random variable with bounded support but unknown mean, i.i.d. over time. The objective is to search for an allocation of channels for all users that maximizes the expected sum throughput. We formulate this problem as a combinatorial multi-armed bandit (MAB), in which each arm corresponds to a matching of the users to channels. Unlike most prior work on multi-armed bandits, this combinatorial formulation results in dependent arms. Moreover, the number of arms grows super-exponentially as the permutation $P(N,M)$. We present a novel matching-learning algorithm with polynomial storage and polynomial computation per decision period for this problem, and prove that it results in a regret (the gap between the expected sum-throughput obtained by a genie-aided perfect allocation and that obtained by this algorithm) that is uniformly upper-bounded for all time $n$ by a function that grows as $O(M^4 N log n)$, i.e. polynomial in the number of unknown parameters and logarithmic in time. We also discuss how our results provide a non-trivial generalization of known theoretical results on multi-armed bandits.","PeriodicalId":106204,"journal":{"name":"2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"225","resultStr":"{\"title\":\"Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation\",\"authors\":\"Yi Gai, B. Krishnamachari, Rahul Jain\",\"doi\":\"10.1109/DYSPAN.2010.5457857\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the following fundamental problem in the context of channelized dynamic spectrum access. There are $M$ secondary users and $N \\\\ge M$ orthogonal channels. Each secondary user requires a single channel for operation that does not conflict with the channels assigned to the other users. Due to geographic dispersion, each secondary user can potentially see different primary user occupancy behavior on each channel. Time is divided into discrete decision rounds. The throughput obtainable from spectrum opportunities on each user-channel combination over a decision period is modeled as an arbitrarily-distributed non-negative random variable with bounded support but unknown mean, i.i.d. over time. The objective is to search for an allocation of channels for all users that maximizes the expected sum throughput. We formulate this problem as a combinatorial multi-armed bandit (MAB), in which each arm corresponds to a matching of the users to channels. Unlike most prior work on multi-armed bandits, this combinatorial formulation results in dependent arms. Moreover, the number of arms grows super-exponentially as the permutation $P(N,M)$. We present a novel matching-learning algorithm with polynomial storage and polynomial computation per decision period for this problem, and prove that it results in a regret (the gap between the expected sum-throughput obtained by a genie-aided perfect allocation and that obtained by this algorithm) that is uniformly upper-bounded for all time $n$ by a function that grows as $O(M^4 N log n)$, i.e. polynomial in the number of unknown parameters and logarithmic in time. We also discuss how our results provide a non-trivial generalization of known theoretical results on multi-armed bandits.\",\"PeriodicalId\":106204,\"journal\":{\"name\":\"2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"225\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DYSPAN.2010.5457857\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DYSPAN.2010.5457857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 225

摘要

在信道化动态频谱接入的背景下，我们考虑以下基本问题。有$M$次用户和$N \ge M$正交信道。每个辅助用户需要一个单独的通道进行操作，该通道与分配给其他用户的通道不冲突。由于地理上的分散，每个辅助用户可能在每个信道上看到不同的主用户占用行为。时间被划分为离散的决策轮。在一个决策周期内，从每个用户信道组合的频谱机会中获得的吞吐量被建模为具有有限支持但均值未知的任意分布的非负随机变量，i.i.d.随时间变化。目标是为所有用户搜索通道分配，使期望的总吞吐量最大化。我们将此问题表述为组合多臂强盗(MAB)，其中每条手臂对应于用户与信道的匹配。与大多数先前的多臂强盗工作不同，这种组合公式导致依赖臂。此外，臂的数量随着P(N,M)的排列呈指数级增长。针对这一问题，我们提出了一种具有多项式存储和多项式决策周期计算的匹配学习算法，并通过一个增长为$O(M^4 n log n)$的函数证明了该算法在所有时间$n$上都是一致上界的遗憾(由基因辅助完美分配获得的期望总吞吐量与该算法获得的期望总吞吐量之间的差距)，即未知参数数量的多项式和时间的对数。我们还讨论了我们的结果如何为已知的多武装强盗理论结果提供了一个非平凡的推广。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation

We consider the following fundamental problem in the context of channelized dynamic spectrum access. There are $M$ secondary users and $N \ge M$ orthogonal channels. Each secondary user requires a single channel for operation that does not conflict with the channels assigned to the other users. Due to geographic dispersion, each secondary user can potentially see different primary user occupancy behavior on each channel. Time is divided into discrete decision rounds. The throughput obtainable from spectrum opportunities on each user-channel combination over a decision period is modeled as an arbitrarily-distributed non-negative random variable with bounded support but unknown mean, i.i.d. over time. The objective is to search for an allocation of channels for all users that maximizes the expected sum throughput. We formulate this problem as a combinatorial multi-armed bandit (MAB), in which each arm corresponds to a matching of the users to channels. Unlike most prior work on multi-armed bandits, this combinatorial formulation results in dependent arms. Moreover, the number of arms grows super-exponentially as the permutation $P(N,M)$. We present a novel matching-learning algorithm with polynomial storage and polynomial computation per decision period for this problem, and prove that it results in a regret (the gap between the expected sum-throughput obtained by a genie-aided perfect allocation and that obtained by this algorithm) that is uniformly upper-bounded for all time $n$ by a function that grows as $O(M^4 N log n)$, i.e. polynomial in the number of unknown parameters and logarithmic in time. We also discuss how our results provide a non-trivial generalization of known theoretical results on multi-armed bandits.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN)

自引率

0.00%

发文量