{"title":"基于马尔可夫决策过程的认知无线电网络最优奖励选择","authors":"Said Lakhal, Z. Guennoun","doi":"10.1145/3128128.3128143","DOIUrl":null,"url":null,"abstract":"The Learning is an indispensable phase in the cognition cycle of cognitive radio network. It corresponds between the executed actions and the estimated rewards. Based on this phase, the agent learns from past experiences to improve his actions in the next interventions. In the literature, there are several methods that treat the artificial learning. Among them, we cite the reinforcement learning that look for the optimal policy, for ensuring the maximum reward. The present work exposes an approach, based on a model of reinforcement learning, namely Markov decision process, to maximize the sum of transfer rates of all secondary users. Such conception defines all notions relative to an environment with finite set of states, including: the agent, all states, the allowed actions with a given state, the obtained reward after the execution of an action and the optimal policy. After the implementation, we remark a correlation between the started policy and the optimal policy, and we improve the performances by referring to a previous work.","PeriodicalId":362403,"journal":{"name":"Proceedings of the 2017 International Conference on Smart Digital Environment","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using Markov decision process in cognitive radio networks towards the optimal reward\",\"authors\":\"Said Lakhal, Z. Guennoun\",\"doi\":\"10.1145/3128128.3128143\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Learning is an indispensable phase in the cognition cycle of cognitive radio network. It corresponds between the executed actions and the estimated rewards. Based on this phase, the agent learns from past experiences to improve his actions in the next interventions. In the literature, there are several methods that treat the artificial learning. Among them, we cite the reinforcement learning that look for the optimal policy, for ensuring the maximum reward. The present work exposes an approach, based on a model of reinforcement learning, namely Markov decision process, to maximize the sum of transfer rates of all secondary users. Such conception defines all notions relative to an environment with finite set of states, including: the agent, all states, the allowed actions with a given state, the obtained reward after the execution of an action and the optimal policy. After the implementation, we remark a correlation between the started policy and the optimal policy, and we improve the performances by referring to a previous work.\",\"PeriodicalId\":362403,\"journal\":{\"name\":\"Proceedings of the 2017 International Conference on Smart Digital Environment\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2017 International Conference on Smart Digital Environment\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3128128.3128143\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 International Conference on Smart Digital Environment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3128128.3128143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Markov decision process in cognitive radio networks towards the optimal reward
The Learning is an indispensable phase in the cognition cycle of cognitive radio network. It corresponds between the executed actions and the estimated rewards. Based on this phase, the agent learns from past experiences to improve his actions in the next interventions. In the literature, there are several methods that treat the artificial learning. Among them, we cite the reinforcement learning that look for the optimal policy, for ensuring the maximum reward. The present work exposes an approach, based on a model of reinforcement learning, namely Markov decision process, to maximize the sum of transfer rates of all secondary users. Such conception defines all notions relative to an environment with finite set of states, including: the agent, all states, the allowed actions with a given state, the obtained reward after the execution of an action and the optimal policy. After the implementation, we remark a correlation between the started policy and the optimal policy, and we improve the performances by referring to a previous work.