{"title":"基于actor - critical强化学习的无线DS-CDMA系统呼叫接纳控制","authors":"Pitipong Chanloha, W. Usaha","doi":"10.1109/ISWPC.2007.342590","DOIUrl":null,"url":null,"abstract":"This paper addresses the call admission control (CAC) problem for multiple services in the uplink of a cellular system using direct sequential code division multiple access (DS-CDMA) when taking into account the physical layer channel and receiver structure at the base station. The problem is formulated as a semi-Markov decision process (SMDP) with constraints on the blocking probabilities and signal-to-interference ratio (SIR). The objective is to find a CAC policy which maximizes the throughput while still satisfying these quality-of-service (QoS) constraints. To solve for a near optimal CAC policy, an online decision-making algorithm based on an actor-critic with temporal-difference learning from a paper is modified by parameterizing the reward signal to deal with the QoS constraints. The proposed algorithm circumvents the computational complexity experienced in conventional dynamic programming techniques","PeriodicalId":403213,"journal":{"name":"2007 2nd International Symposium on Wireless Pervasive Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Call Admission Control in Wireless DS-CDMA Systems using Actor-Critic Reinforcement Learning\",\"authors\":\"Pitipong Chanloha, W. Usaha\",\"doi\":\"10.1109/ISWPC.2007.342590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper addresses the call admission control (CAC) problem for multiple services in the uplink of a cellular system using direct sequential code division multiple access (DS-CDMA) when taking into account the physical layer channel and receiver structure at the base station. The problem is formulated as a semi-Markov decision process (SMDP) with constraints on the blocking probabilities and signal-to-interference ratio (SIR). The objective is to find a CAC policy which maximizes the throughput while still satisfying these quality-of-service (QoS) constraints. To solve for a near optimal CAC policy, an online decision-making algorithm based on an actor-critic with temporal-difference learning from a paper is modified by parameterizing the reward signal to deal with the QoS constraints. The proposed algorithm circumvents the computational complexity experienced in conventional dynamic programming techniques\",\"PeriodicalId\":403213,\"journal\":{\"name\":\"2007 2nd International Symposium on Wireless Pervasive Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 2nd International Symposium on Wireless Pervasive Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISWPC.2007.342590\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 2nd International Symposium on Wireless Pervasive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISWPC.2007.342590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Call Admission Control in Wireless DS-CDMA Systems using Actor-Critic Reinforcement Learning
This paper addresses the call admission control (CAC) problem for multiple services in the uplink of a cellular system using direct sequential code division multiple access (DS-CDMA) when taking into account the physical layer channel and receiver structure at the base station. The problem is formulated as a semi-Markov decision process (SMDP) with constraints on the blocking probabilities and signal-to-interference ratio (SIR). The objective is to find a CAC policy which maximizes the throughput while still satisfying these quality-of-service (QoS) constraints. To solve for a near optimal CAC policy, an online decision-making algorithm based on an actor-critic with temporal-difference learning from a paper is modified by parameterizing the reward signal to deal with the QoS constraints. The proposed algorithm circumvents the computational complexity experienced in conventional dynamic programming techniques