Bowen Li, Panlong Yang, Jinlong Wang, Qi-hui Wu, Nan Xia
{"title":"Non-Bayesian Learning of Channel Sensing Order for Dynamic Spectrum Access Networks","authors":"Bowen Li, Panlong Yang, Jinlong Wang, Qi-hui Wu, Nan Xia","doi":"10.1109/CyberC.2011.91","DOIUrl":null,"url":null,"abstract":"In this paper, we consider sequential channel sensing and accessing with unknown channel availability probabilities. In such sequential channel sensing framework, it is critical important to arrange the channel sensing order properly at the beginning of each time slot, so as to balance the tradeoff between finding an idle channel as soon as possible in current slot (exploitation by sensing currently looks better channel preferentially) and making the channel statistics more precise to benefit the following sensing order arrangement (exploration by sensing the channels currently looks suboptimal). To handle this tradeoff balancing problem, we propose five online learning algorithms: Pure Exploitation (PE), Exploitation with Optimism Initial Estimation (EOIE), UCB-based Order (UCBO), $\\varepsilon$-greedy Order and SoftMax Order. The latter three algorithms are extended from three classic algorithms on handling multi-armed bandit problem: UCB1\\cite{Auer_02}, $\\varepsilon$-greedy\\cite{Cesa_98} and SoftMax\\cite{Cesa_98}. Among these algorithms, UCBO is the only one that we have proved to be a zero-regret strategy, which means the system is guaranteed to achieve optimal sensing order by applying UCBO, if enough slots are played. We then evaluate the algorithms by simulations. The results are a little surprising but meaningful. We find that the PE algorithm, which is treated as a simple strategy with poor performance for traditional multi-armed bandit problems, performs very well in the order learning problem when the number of channels is small. And our proposed EOIE algorithm performs nearly optimal in all cases. In contrast, the UCB-based Order converges too slowly for actual application, although it provides theoretical zero-regret guarantee as we have proved.","PeriodicalId":227472,"journal":{"name":"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberC.2011.91","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper, we consider sequential channel sensing and accessing with unknown channel availability probabilities. In such sequential channel sensing framework, it is critical important to arrange the channel sensing order properly at the beginning of each time slot, so as to balance the tradeoff between finding an idle channel as soon as possible in current slot (exploitation by sensing currently looks better channel preferentially) and making the channel statistics more precise to benefit the following sensing order arrangement (exploration by sensing the channels currently looks suboptimal). To handle this tradeoff balancing problem, we propose five online learning algorithms: Pure Exploitation (PE), Exploitation with Optimism Initial Estimation (EOIE), UCB-based Order (UCBO), $\varepsilon$-greedy Order and SoftMax Order. The latter three algorithms are extended from three classic algorithms on handling multi-armed bandit problem: UCB1\cite{Auer_02}, $\varepsilon$-greedy\cite{Cesa_98} and SoftMax\cite{Cesa_98}. Among these algorithms, UCBO is the only one that we have proved to be a zero-regret strategy, which means the system is guaranteed to achieve optimal sensing order by applying UCBO, if enough slots are played. We then evaluate the algorithms by simulations. The results are a little surprising but meaningful. We find that the PE algorithm, which is treated as a simple strategy with poor performance for traditional multi-armed bandit problems, performs very well in the order learning problem when the number of channels is small. And our proposed EOIE algorithm performs nearly optimal in all cases. In contrast, the UCB-based Order converges too slowly for actual application, although it provides theoretical zero-regret guarantee as we have proved.