Haoran Liu , Xiaohu Li , Luodan Zhang , Rongjun Cheng
{"title":"Bridging phase and timing: A joint Q-value learning framework for synergistic traffic signal control at consecutive arterial road intersections","authors":"Haoran Liu , Xiaohu Li , Luodan Zhang , Rongjun Cheng","doi":"10.1016/j.physa.2026.131421","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement Learning (RL) is a highly effective traffic control method and shows strong potential. However, existing reinforcement learning-based methods for traffic signal control are insufficient in state modeling and cannot distinguish between the direct impact of queued vehicles on the intersection and the dynamic impact of moving vehicles. Additionally, they lack the ability to capture the deep synergistic relationship between signal phases and dynamic phase durations, which often leads to suboptimal decisions. This paper proposes an RL traffic signal control method based on joint Q-values, employing a detailed and predictive state representation that fully considers the different effects of queued and moving vehicles on intersection congestion. By combining vehicle speed and position features, the dynamic influence of moving vehicles on future traffic pressure is quantified. Meanwhile, the model effectively integrates lane and phase features using a multi-head attention mechanism, automatically capturing the conflict and cooperation relationships among different traffic flows. On this basis, a joint Q-value learning framework is adopted, treating signal phase selection and dynamic duration decisions as a complete decision unit for joint optimization. This directly learns the synergy between the two, thereby avoiding the problem of suboptimal decisions where efficient phases are chosen but with insufficient timing. Comprehensive experimental results in both real and synthetic scenarios show that our method achieves up to a 7.20% reduction in average travel time across various intersection settings, while also having a higher vehicle throughput. Moreover, the model converges to the optimal solution more quickly. This characteristic ensures excellent model performance while maintaining strong generalization capability.</div></div>","PeriodicalId":20152,"journal":{"name":"Physica A: Statistical Mechanics and its Applications","volume":"688 ","pages":"Article 131421"},"PeriodicalIF":3.1000,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physica A: Statistical Mechanics and its Applications","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378437126001573","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/25 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement Learning (RL) is a highly effective traffic control method and shows strong potential. However, existing reinforcement learning-based methods for traffic signal control are insufficient in state modeling and cannot distinguish between the direct impact of queued vehicles on the intersection and the dynamic impact of moving vehicles. Additionally, they lack the ability to capture the deep synergistic relationship between signal phases and dynamic phase durations, which often leads to suboptimal decisions. This paper proposes an RL traffic signal control method based on joint Q-values, employing a detailed and predictive state representation that fully considers the different effects of queued and moving vehicles on intersection congestion. By combining vehicle speed and position features, the dynamic influence of moving vehicles on future traffic pressure is quantified. Meanwhile, the model effectively integrates lane and phase features using a multi-head attention mechanism, automatically capturing the conflict and cooperation relationships among different traffic flows. On this basis, a joint Q-value learning framework is adopted, treating signal phase selection and dynamic duration decisions as a complete decision unit for joint optimization. This directly learns the synergy between the two, thereby avoiding the problem of suboptimal decisions where efficient phases are chosen but with insufficient timing. Comprehensive experimental results in both real and synthetic scenarios show that our method achieves up to a 7.20% reduction in average travel time across various intersection settings, while also having a higher vehicle throughput. Moreover, the model converges to the optimal solution more quickly. This characteristic ensures excellent model performance while maintaining strong generalization capability.
期刊介绍:
Physica A: Statistical Mechanics and its Applications
Recognized by the European Physical Society
Physica A publishes research in the field of statistical mechanics and its applications.
Statistical mechanics sets out to explain the behaviour of macroscopic systems by studying the statistical properties of their microscopic constituents.
Applications of the techniques of statistical mechanics are widespread, and include: applications to physical systems such as solids, liquids and gases; applications to chemical and biological systems (colloids, interfaces, complex fluids, polymers and biopolymers, cell physics); and other interdisciplinary applications to for instance biological, economical and sociological systems.