Non-Bayesian Learning of Channel Sensing Order for Dynamic Spectrum Access Networks

2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery Pub Date : 2011-10-10 DOI:10.1109/CyberC.2011.91

Bowen Li, Panlong Yang, Jinlong Wang, Qi-hui Wu, Nan Xia

{"title":"Non-Bayesian Learning of Channel Sensing Order for Dynamic Spectrum Access Networks","authors":"Bowen Li, Panlong Yang, Jinlong Wang, Qi-hui Wu, Nan Xia","doi":"10.1109/CyberC.2011.91","DOIUrl":null,"url":null,"abstract":"In this paper, we consider sequential channel sensing and accessing with unknown channel availability probabilities. In such sequential channel sensing framework, it is critical important to arrange the channel sensing order properly at the beginning of each time slot, so as to balance the tradeoff between finding an idle channel as soon as possible in current slot (exploitation by sensing currently looks better channel preferentially) and making the channel statistics more precise to benefit the following sensing order arrangement (exploration by sensing the channels currently looks suboptimal). To handle this tradeoff balancing problem, we propose five online learning algorithms: Pure Exploitation (PE), Exploitation with Optimism Initial Estimation (EOIE), UCB-based Order (UCBO), $\\varepsilon$-greedy Order and SoftMax Order. The latter three algorithms are extended from three classic algorithms on handling multi-armed bandit problem: UCB1\\cite{Auer_02}, $\\varepsilon$-greedy\\cite{Cesa_98} and SoftMax\\cite{Cesa_98}. Among these algorithms, UCBO is the only one that we have proved to be a zero-regret strategy, which means the system is guaranteed to achieve optimal sensing order by applying UCBO, if enough slots are played. We then evaluate the algorithms by simulations. The results are a little surprising but meaningful. We find that the PE algorithm, which is treated as a simple strategy with poor performance for traditional multi-armed bandit problems, performs very well in the order learning problem when the number of channels is small. And our proposed EOIE algorithm performs nearly optimal in all cases. In contrast, the UCB-based Order converges too slowly for actual application, although it provides theoretical zero-regret guarantee as we have proved.","PeriodicalId":227472,"journal":{"name":"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC.2011.91","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In this paper, we consider sequential channel sensing and accessing with unknown channel availability probabilities. In such sequential channel sensing framework, it is critical important to arrange the channel sensing order properly at the beginning of each time slot, so as to balance the tradeoff between finding an idle channel as soon as possible in current slot (exploitation by sensing currently looks better channel preferentially) and making the channel statistics more precise to benefit the following sensing order arrangement (exploration by sensing the channels currently looks suboptimal). To handle this tradeoff balancing problem, we propose five online learning algorithms: Pure Exploitation (PE), Exploitation with Optimism Initial Estimation (EOIE), UCB-based Order (UCBO), $\varepsilon$-greedy Order and SoftMax Order. The latter three algorithms are extended from three classic algorithms on handling multi-armed bandit problem: UCB1\cite{Auer_02}, $\varepsilon$-greedy\cite{Cesa_98} and SoftMax\cite{Cesa_98}. Among these algorithms, UCBO is the only one that we have proved to be a zero-regret strategy, which means the system is guaranteed to achieve optimal sensing order by applying UCBO, if enough slots are played. We then evaluate the algorithms by simulations. The results are a little surprising but meaningful. We find that the PE algorithm, which is treated as a simple strategy with poor performance for traditional multi-armed bandit problems, performs very well in the order learning problem when the number of channels is small. And our proposed EOIE algorithm performs nearly optimal in all cases. In contrast, the UCB-based Order converges too slowly for actual application, although it provides theoretical zero-regret guarantee as we have proved.

查看原文本刊更多论文

动态频谱接入网络中信道感知顺序的非贝叶斯学习

本文考虑了未知信道可用概率下的顺序信道感知和访问问题。在这种顺序信道感知框架中，在每个时隙的开始位置合理安排信道感知顺序至关重要，以平衡在当前时隙中尽快找到空闲信道(通过感知当前看起来更好的信道优先利用)和使信道统计更精确以有利于后续感知顺序安排(通过感知当前看起来次优的信道探索)之间的权衡。为了处理这种权衡平衡问题，我们提出了五种在线学习算法:纯开发(PE)，乐观初始估计开发(EOIE)，基于ucb的订单(UCBO)， $\varepsilon$ -贪婪订单和SoftMax订单。后三种算法是在处理多臂强盗问题的经典算法UCB1 \cite{Auer_02}、$\varepsilon$ -greedy \cite{Cesa_98}和SoftMax \cite{Cesa_98}的基础上扩展而来的。在这些算法中，UCBO是我们唯一证明为零后悔策略的算法，这意味着如果有足够的插槽，系统可以保证通过使用UCBO获得最优的感知顺序。然后，我们通过模拟来评估算法。结果有点出人意料，但却很有意义。我们发现，对于传统的多臂强盗问题，PE算法被认为是一种性能较差的简单策略，但在通道数量较少的顺序学习问题中，PE算法却有很好的表现。我们提出的EOIE算法在所有情况下都表现得几乎最优。相比之下，基于ucb的Order对于实际应用来说收敛速度太慢，尽管它提供了我们已经证明的理论上的零后悔保证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery

自引率

0.00%

发文量