强化学习交叉控制器

2018 14th International Conference on Electronics Computer and Computation (ICECCO) Pub Date : 2018-11-01 DOI:10.1109/ICECCO.2018.8634692

Gulnur Tolebi, N. S. Dairbekov, D. Kurmankhojayev, Ravil Mussabayev

{"title":"强化学习交叉控制器","authors":"Gulnur Tolebi, N. S. Dairbekov, D. Kurmankhojayev, Ravil Mussabayev","doi":"10.1109/ICECCO.2018.8634692","DOIUrl":null,"url":null,"abstract":"This paper presents an online model-free adaptive traffic signal controller for an isolated intersection using a Reinforcement Learning (RL) approach. We base our solution on the Q-learning algorithm with action-value approximation. In contrast with other studies in the field, we use the queue length in adddition to the average delay as a measure of performance. Also, the number of queuing vehicles and the green phase duration in four directions are aggregated to represent a state. The duration of phases is a precise value for the nonconflicting directions. Therefore, cycle length is non-fixed. Finally, we analyze and update the equilibrium and queue reduction terms in our previous equation of an immediate reward. Also, the delay based reward is tested in the given control system. The performance of the proposed method is compared with an optimal symmetric fixed signal plan.","PeriodicalId":399326,"journal":{"name":"2018 14th International Conference on Electronics Computer and Computation (ICECCO)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Reinforcement Learning Intersection Controller\",\"authors\":\"Gulnur Tolebi, N. S. Dairbekov, D. Kurmankhojayev, Ravil Mussabayev\",\"doi\":\"10.1109/ICECCO.2018.8634692\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents an online model-free adaptive traffic signal controller for an isolated intersection using a Reinforcement Learning (RL) approach. We base our solution on the Q-learning algorithm with action-value approximation. In contrast with other studies in the field, we use the queue length in adddition to the average delay as a measure of performance. Also, the number of queuing vehicles and the green phase duration in four directions are aggregated to represent a state. The duration of phases is a precise value for the nonconflicting directions. Therefore, cycle length is non-fixed. Finally, we analyze and update the equilibrium and queue reduction terms in our previous equation of an immediate reward. Also, the delay based reward is tested in the given control system. The performance of the proposed method is compared with an optimal symmetric fixed signal plan.\",\"PeriodicalId\":399326,\"journal\":{\"name\":\"2018 14th International Conference on Electronics Computer and Computation (ICECCO)\",\"volume\":\"107 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 14th International Conference on Electronics Computer and Computation (ICECCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICECCO.2018.8634692\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 14th International Conference on Electronics Computer and Computation (ICECCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECCO.2018.8634692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

提出了一种基于强化学习(RL)方法的孤立交叉口在线无模型自适应交通信号控制器。我们的解决基于q -学习算法与动作值近似。与该领域的其他研究相比，我们使用队列长度和平均延迟作为性能的衡量标准。同时，将四个方向上的排队车辆数量和绿灯相位持续时间相加，表示一个状态。相位持续时间是不冲突方向的精确值。因此，周期长度是不固定的。最后，我们分析并更新了前面即时奖励方程中的均衡项和队列缩减项。同时，在给定的控制系统中对基于延迟的奖励进行了测试。将该方法的性能与最优对称固定信号方案进行了比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reinforcement Learning Intersection Controller

This paper presents an online model-free adaptive traffic signal controller for an isolated intersection using a Reinforcement Learning (RL) approach. We base our solution on the Q-learning algorithm with action-value approximation. In contrast with other studies in the field, we use the queue length in adddition to the average delay as a measure of performance. Also, the number of queuing vehicles and the green phase duration in four directions are aggregated to represent a state. The duration of phases is a precise value for the nonconflicting directions. Therefore, cycle length is non-fixed. Finally, we analyze and update the equilibrium and queue reduction terms in our previous equation of an immediate reward. Also, the delay based reward is tested in the given control system. The performance of the proposed method is compared with an optimal symmetric fixed signal plan.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 14th International Conference on Electronics Computer and Computation (ICECCO)

自引率

0.00%

发文量