{"title":"基于部分可观察性的交通信号灯控制强化学习智能体","authors":"Thanapapas Horsuwan, C. Aswakul","doi":"10.29007/BDGN","DOIUrl":null,"url":null,"abstract":"Bangkok is notorious for its chronic traffic congestion due to the rapid urbanization and the haphazard city plan. The Sathorn Road network area stands to be one of the most critical areas where gridlocks are a normal occurrence during rush hours. This stems from the high volume of demand imposed by the dense geographical placement of 3 big educational institutions and the insufficient link capacity with strict routes. Current solutions place heavy reliance on human traffic control expertises to prevent and disentangle gridlocks by consecutively releasing each queue length spillback through inter-junction coordination. A calibrated dataset of the Sathorn Road network area in a microscopic road traffic simulation package SUMO (Simulation of Urban MObility) is provided in the work of Chula-Sathorn SUMO Simulator (Chula-SSS). In this paper, we aim to utilize the Chula-SSS dataset with extended vehicle flows and gridlocks in order to further optimize the present traffic signal control policies with reinforcement learning approaches by an artificial agent. Reinforcement learning has been successful in a variety of domains over the past few years. While a number of researches exist on using reinforcement learning with adaptive traffic light control, existing studies often lack pragmatic considerations concerning application to the physical world especially for the traffic system infrastructure in developing countries, which suffer from constraints imposed from economic factors. The resultant limitation of the agent’s partial observability of the whole network state at any specific time is imperative and cannot be overlooked. With such partial observability constraints, this paper has reported an investigation on applying the Ape-X Deep Q-Network agent at the critical junction in the morning rush hours from 6 AM to 9 AM with practically occasional presence of gridlocks. The obtainable results have shown a potential value of the agent’s ability to learn despite physical limitations in the traffic light control at the considered intersection within the Sathorn gridlock area. This suggests a possibility of further investigations on agent applicability in trying to mitigate complex interconnected gridlocks in the future.","PeriodicalId":201953,"journal":{"name":"International Conference on Simulation of Urban Mobility","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Reinforcement Learning Agent under Partial Observability for Traffic Light Control in Presence of Gridlocks\",\"authors\":\"Thanapapas Horsuwan, C. Aswakul\",\"doi\":\"10.29007/BDGN\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bangkok is notorious for its chronic traffic congestion due to the rapid urbanization and the haphazard city plan. The Sathorn Road network area stands to be one of the most critical areas where gridlocks are a normal occurrence during rush hours. This stems from the high volume of demand imposed by the dense geographical placement of 3 big educational institutions and the insufficient link capacity with strict routes. Current solutions place heavy reliance on human traffic control expertises to prevent and disentangle gridlocks by consecutively releasing each queue length spillback through inter-junction coordination. A calibrated dataset of the Sathorn Road network area in a microscopic road traffic simulation package SUMO (Simulation of Urban MObility) is provided in the work of Chula-Sathorn SUMO Simulator (Chula-SSS). In this paper, we aim to utilize the Chula-SSS dataset with extended vehicle flows and gridlocks in order to further optimize the present traffic signal control policies with reinforcement learning approaches by an artificial agent. Reinforcement learning has been successful in a variety of domains over the past few years. While a number of researches exist on using reinforcement learning with adaptive traffic light control, existing studies often lack pragmatic considerations concerning application to the physical world especially for the traffic system infrastructure in developing countries, which suffer from constraints imposed from economic factors. The resultant limitation of the agent’s partial observability of the whole network state at any specific time is imperative and cannot be overlooked. With such partial observability constraints, this paper has reported an investigation on applying the Ape-X Deep Q-Network agent at the critical junction in the morning rush hours from 6 AM to 9 AM with practically occasional presence of gridlocks. The obtainable results have shown a potential value of the agent’s ability to learn despite physical limitations in the traffic light control at the considered intersection within the Sathorn gridlock area. This suggests a possibility of further investigations on agent applicability in trying to mitigate complex interconnected gridlocks in the future.\",\"PeriodicalId\":201953,\"journal\":{\"name\":\"International Conference on Simulation of Urban Mobility\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Simulation of Urban Mobility\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29007/BDGN\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Simulation of Urban Mobility","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29007/BDGN","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
摘要
由于快速的城市化和杂乱无章的城市规划,曼谷因长期的交通拥堵而臭名昭著。萨索恩道路网区域是最关键的区域之一,交通堵塞是高峰时段的正常现象。这源于三大教育机构密集的地理位置和严格路线的连接能力不足所带来的高需求。目前的解决方案严重依赖于人工交通控制专家,通过交叉路口的协调,连续释放每个队列长度的溢出来预防和解决交通堵塞。在Chula-Sathorn SUMO Simulator (Chula-SSS)的工作中,提供了微观道路交通模拟软件包SUMO (simulation of Urban MObility)中Sathorn路网区域的校准数据集。在本文中,我们的目标是利用Chula-SSS数据集扩展车辆流和交通阻塞,通过人工智能体的强化学习方法进一步优化当前的交通信号控制策略。在过去的几年里,强化学习在许多领域都取得了成功。虽然已有大量研究将强化学习应用于自适应交通灯控制,但现有研究往往缺乏对现实世界应用的实用考虑,特别是发展中国家的交通系统基础设施,受到经济因素的制约。由此产生的智能体在任何特定时间对整个网络状态的部分可观察性的限制是必要的,不可忽视的。在这种部分可观察性约束下,本文报道了在早高峰时间从上午6点到上午9点的关键路口应用Ape-X深度Q-Network代理的研究,实际上偶尔会出现交通堵塞。可获得的结果显示了智能体的学习能力的潜在价值,尽管在萨索恩交通堵塞区域内的十字路口的交通信号灯控制存在物理限制。这表明,未来有可能进一步研究智能体的适用性,以缓解复杂的互联交通堵塞。
Reinforcement Learning Agent under Partial Observability for Traffic Light Control in Presence of Gridlocks
Bangkok is notorious for its chronic traffic congestion due to the rapid urbanization and the haphazard city plan. The Sathorn Road network area stands to be one of the most critical areas where gridlocks are a normal occurrence during rush hours. This stems from the high volume of demand imposed by the dense geographical placement of 3 big educational institutions and the insufficient link capacity with strict routes. Current solutions place heavy reliance on human traffic control expertises to prevent and disentangle gridlocks by consecutively releasing each queue length spillback through inter-junction coordination. A calibrated dataset of the Sathorn Road network area in a microscopic road traffic simulation package SUMO (Simulation of Urban MObility) is provided in the work of Chula-Sathorn SUMO Simulator (Chula-SSS). In this paper, we aim to utilize the Chula-SSS dataset with extended vehicle flows and gridlocks in order to further optimize the present traffic signal control policies with reinforcement learning approaches by an artificial agent. Reinforcement learning has been successful in a variety of domains over the past few years. While a number of researches exist on using reinforcement learning with adaptive traffic light control, existing studies often lack pragmatic considerations concerning application to the physical world especially for the traffic system infrastructure in developing countries, which suffer from constraints imposed from economic factors. The resultant limitation of the agent’s partial observability of the whole network state at any specific time is imperative and cannot be overlooked. With such partial observability constraints, this paper has reported an investigation on applying the Ape-X Deep Q-Network agent at the critical junction in the morning rush hours from 6 AM to 9 AM with practically occasional presence of gridlocks. The obtainable results have shown a potential value of the agent’s ability to learn despite physical limitations in the traffic light control at the considered intersection within the Sathorn gridlock area. This suggests a possibility of further investigations on agent applicability in trying to mitigate complex interconnected gridlocks in the future.