{"title":"Single-Agent Reinforcement Learning Model for Adaptive Traffic Signal Control in Urban Corridors","authors":"Qiang Li, Xiya Zhuang, Lishan Liu, Bokui Chen, Yi Zhang, Yingping Zhao","doi":"10.1155/atr/5134018","DOIUrl":null,"url":null,"abstract":"<p>For multi-intersection control, existing research mainly adopts multiagent frameworks to tackle scalability issues. However, the traffic signal control (TSC) problem necessitates a single-agent framework, as a single control center monitors traffic conditions across all roads in the study area and coordinates the control of all intersections. This work proposes a novel single-agent RL-based urban corridor ATSC model: It abandons complex multiagent coordination and uses a single agent to centrally orchestrate signal timings across multiple intersections. Notably, the model is highly applicable to real-world settings. It defines state and reward functions based on a queue length metric—one that correlates with congestion and can be reliably estimated using probe vehicle data. Since probe vehicle data have become highly prevalent, this feature enables rapid, large-scale deployment. The single-agent framework primarily relies on a unique design of state, action, and reward. To facilitate learning and manage congestion, both state and reward functions are defined based on queue length, with actions designed to modulate queue dynamics. The queue length definition used in this study deviates slightly from conventional definitions but is closely correlated with congestion states. The method was comprehensively evaluated using the SUMO simulation platform under various traffic patterns. Experimental results show that the PPO algorithm demonstrates significantly faster learning than the DQN algorithm. The model effectively alleviates urban corridor congestion through coordinated multi-intersection control: During the simulation period of the entire scenario, the queue length did not exceed 50 vehicles, and instances where it exceeded 30 vehicles were relatively rare. Compared with the baseline scenario where queue lengths exceeded 150 vehicles, the proposed method significantly reduces road congestion. The work in this paper demonstrates the feasibility of controlling multiple intersections under a single-agent framework, and the control scope will be further expanded in the future.</p>","PeriodicalId":50259,"journal":{"name":"Journal of Advanced Transportation","volume":"2026 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2026-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/atr/5134018","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advanced Transportation","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1155/atr/5134018","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
Abstract
For multi-intersection control, existing research mainly adopts multiagent frameworks to tackle scalability issues. However, the traffic signal control (TSC) problem necessitates a single-agent framework, as a single control center monitors traffic conditions across all roads in the study area and coordinates the control of all intersections. This work proposes a novel single-agent RL-based urban corridor ATSC model: It abandons complex multiagent coordination and uses a single agent to centrally orchestrate signal timings across multiple intersections. Notably, the model is highly applicable to real-world settings. It defines state and reward functions based on a queue length metric—one that correlates with congestion and can be reliably estimated using probe vehicle data. Since probe vehicle data have become highly prevalent, this feature enables rapid, large-scale deployment. The single-agent framework primarily relies on a unique design of state, action, and reward. To facilitate learning and manage congestion, both state and reward functions are defined based on queue length, with actions designed to modulate queue dynamics. The queue length definition used in this study deviates slightly from conventional definitions but is closely correlated with congestion states. The method was comprehensively evaluated using the SUMO simulation platform under various traffic patterns. Experimental results show that the PPO algorithm demonstrates significantly faster learning than the DQN algorithm. The model effectively alleviates urban corridor congestion through coordinated multi-intersection control: During the simulation period of the entire scenario, the queue length did not exceed 50 vehicles, and instances where it exceeded 30 vehicles were relatively rare. Compared with the baseline scenario where queue lengths exceeded 150 vehicles, the proposed method significantly reduces road congestion. The work in this paper demonstrates the feasibility of controlling multiple intersections under a single-agent framework, and the control scope will be further expanded in the future.
期刊介绍:
The Journal of Advanced Transportation (JAT) is a fully peer reviewed international journal in transportation research areas related to public transit, road traffic, transport networks and air transport.
It publishes theoretical and innovative papers on analysis, design, operations, optimization and planning of multi-modal transport networks, transit & traffic systems, transport technology and traffic safety. Urban rail and bus systems, Pedestrian studies, traffic flow theory and control, Intelligent Transport Systems (ITS) and automated and/or connected vehicles are some topics of interest.
Highway engineering, railway engineering and logistics do not fall within the aims and scope of JAT.