Gulnur Tolebi, N. S. Dairbekov, D. Kurmankhojayev, Ravil Mussabayev
{"title":"强化学习交叉控制器","authors":"Gulnur Tolebi, N. S. Dairbekov, D. Kurmankhojayev, Ravil Mussabayev","doi":"10.1109/ICECCO.2018.8634692","DOIUrl":null,"url":null,"abstract":"This paper presents an online model-free adaptive traffic signal controller for an isolated intersection using a Reinforcement Learning (RL) approach. We base our solution on the Q-learning algorithm with action-value approximation. In contrast with other studies in the field, we use the queue length in adddition to the average delay as a measure of performance. Also, the number of queuing vehicles and the green phase duration in four directions are aggregated to represent a state. The duration of phases is a precise value for the nonconflicting directions. Therefore, cycle length is non-fixed. Finally, we analyze and update the equilibrium and queue reduction terms in our previous equation of an immediate reward. Also, the delay based reward is tested in the given control system. The performance of the proposed method is compared with an optimal symmetric fixed signal plan.","PeriodicalId":399326,"journal":{"name":"2018 14th International Conference on Electronics Computer and Computation (ICECCO)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Reinforcement Learning Intersection Controller\",\"authors\":\"Gulnur Tolebi, N. S. Dairbekov, D. Kurmankhojayev, Ravil Mussabayev\",\"doi\":\"10.1109/ICECCO.2018.8634692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an online model-free adaptive traffic signal controller for an isolated intersection using a Reinforcement Learning (RL) approach. We base our solution on the Q-learning algorithm with action-value approximation. In contrast with other studies in the field, we use the queue length in adddition to the average delay as a measure of performance. Also, the number of queuing vehicles and the green phase duration in four directions are aggregated to represent a state. The duration of phases is a precise value for the nonconflicting directions. Therefore, cycle length is non-fixed. Finally, we analyze and update the equilibrium and queue reduction terms in our previous equation of an immediate reward. Also, the delay based reward is tested in the given control system. The performance of the proposed method is compared with an optimal symmetric fixed signal plan.\",\"PeriodicalId\":399326,\"journal\":{\"name\":\"2018 14th International Conference on Electronics Computer and Computation (ICECCO)\",\"volume\":\"107 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 14th International Conference on Electronics Computer and Computation (ICECCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECCO.2018.8634692\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 14th International Conference on Electronics Computer and Computation (ICECCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCO.2018.8634692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper presents an online model-free adaptive traffic signal controller for an isolated intersection using a Reinforcement Learning (RL) approach. We base our solution on the Q-learning algorithm with action-value approximation. In contrast with other studies in the field, we use the queue length in adddition to the average delay as a measure of performance. Also, the number of queuing vehicles and the green phase duration in four directions are aggregated to represent a state. The duration of phases is a precise value for the nonconflicting directions. Therefore, cycle length is non-fixed. Finally, we analyze and update the equilibrium and queue reduction terms in our previous equation of an immediate reward. Also, the delay based reward is tested in the given control system. The performance of the proposed method is compared with an optimal symmetric fixed signal plan.