A. Alaya-Feki, B. Sayraç, É. Moulines, A. L. Cornec
{"title":"机会主义频谱访问:最优性的在线搜索","authors":"A. Alaya-Feki, B. Sayraç, É. Moulines, A. L. Cornec","doi":"10.1109/GLOCOM.2008.ECP.594","DOIUrl":null,"url":null,"abstract":"This paper presents an online tuning approach for the ad-hoc reinforcement learning algorithms which are used for solving the exploitation-exploration dilemma of the opportunistic spectrum access, in dynamic environments. These algorithms originate from a well-known problem in computer science: the multi-armed bandit (MAB) problem and they have provided evidence to be viable solutions for the detection and exploration of white spaces in opportunistic spectrum access. Previous work (A. Ben Hadj Alaya-Feki et al., 2008) has shown that the reinforcement learning solutions of the MAB problem are very sensitive to the statistical properties of the wireless medium access and therefore need careful tuning according to the dynamic variations of the wireless environment. This paper deals with the online tuning of those algorithms by proposing and assessing two different approaches: 1-a meta learning approach where a second learner (meta learner) is used to learn the parameters of the base learner, and 2-the Exp3 algorithm that has been previously proposed for dynamical tuning of MAB parameters in other contexts. The simulation results obtained on an IEEE 802.11medium access scenario show that one of the proposed meta-learning methods, namely the change point detection method, achieves much better performance compared to the other methods.","PeriodicalId":297815,"journal":{"name":"IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Opportunistic Spectrum Access: Online Search of Optimality\",\"authors\":\"A. Alaya-Feki, B. Sayraç, É. Moulines, A. L. Cornec\",\"doi\":\"10.1109/GLOCOM.2008.ECP.594\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an online tuning approach for the ad-hoc reinforcement learning algorithms which are used for solving the exploitation-exploration dilemma of the opportunistic spectrum access, in dynamic environments. These algorithms originate from a well-known problem in computer science: the multi-armed bandit (MAB) problem and they have provided evidence to be viable solutions for the detection and exploration of white spaces in opportunistic spectrum access. Previous work (A. Ben Hadj Alaya-Feki et al., 2008) has shown that the reinforcement learning solutions of the MAB problem are very sensitive to the statistical properties of the wireless medium access and therefore need careful tuning according to the dynamic variations of the wireless environment. This paper deals with the online tuning of those algorithms by proposing and assessing two different approaches: 1-a meta learning approach where a second learner (meta learner) is used to learn the parameters of the base learner, and 2-the Exp3 algorithm that has been previously proposed for dynamical tuning of MAB parameters in other contexts. The simulation results obtained on an IEEE 802.11medium access scenario show that one of the proposed meta-learning methods, namely the change point detection method, achieves much better performance compared to the other methods.\",\"PeriodicalId\":297815,\"journal\":{\"name\":\"IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GLOCOM.2008.ECP.594\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GLOCOM.2008.ECP.594","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
摘要
本文提出了一种用于解决动态环境下机会频谱接入的利用-探索困境的自组织强化学习算法的在线调整方法。这些算法起源于计算机科学中一个众所周知的问题:多臂强盗(MAB)问题,它们为机会性频谱接入中的白色空间的检测和探索提供了可行的解决方案。先前的工作(A. Ben Hadj Alaya-Feki et al., 2008)表明,MAB问题的强化学习解决方案对无线媒体访问的统计特性非常敏感,因此需要根据无线环境的动态变化仔细调整。本文通过提出和评估两种不同的方法来处理这些算法的在线调优:1-元学习方法,其中使用第二个学习器(元学习器)来学习基本学习器的参数,以及2- Exp3算法,该算法先前已被提出用于动态调优MAB参数。在IEEE 802.11介质接入场景下的仿真结果表明,其中一种元学习方法,即变化点检测方法,与其他方法相比,取得了更好的性能。
Opportunistic Spectrum Access: Online Search of Optimality
This paper presents an online tuning approach for the ad-hoc reinforcement learning algorithms which are used for solving the exploitation-exploration dilemma of the opportunistic spectrum access, in dynamic environments. These algorithms originate from a well-known problem in computer science: the multi-armed bandit (MAB) problem and they have provided evidence to be viable solutions for the detection and exploration of white spaces in opportunistic spectrum access. Previous work (A. Ben Hadj Alaya-Feki et al., 2008) has shown that the reinforcement learning solutions of the MAB problem are very sensitive to the statistical properties of the wireless medium access and therefore need careful tuning according to the dynamic variations of the wireless environment. This paper deals with the online tuning of those algorithms by proposing and assessing two different approaches: 1-a meta learning approach where a second learner (meta learner) is used to learn the parameters of the base learner, and 2-the Exp3 algorithm that has been previously proposed for dynamical tuning of MAB parameters in other contexts. The simulation results obtained on an IEEE 802.11medium access scenario show that one of the proposed meta-learning methods, namely the change point detection method, achieves much better performance compared to the other methods.