Yilong Ren , Yizhuo Chang , Zhiyong Cui , Xiao Chang , Haiyang Yu , Xiaosong Li , Yinhai Wang
{"title":"Is cooperative always better? Multi-Agent Reinforcement Learning with explicit neighborhood backtracking for network-wide traffic signal control","authors":"Yilong Ren , Yizhuo Chang , Zhiyong Cui , Xiao Chang , Haiyang Yu , Xiaosong Li , Yinhai Wang","doi":"10.1016/j.trc.2025.105265","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-Agent Reinforcement Learning (MARL) has been empirically demonstrated as a highly promising paradigm for the Cooperative Traffic Signal Control (CTSC) of urban road networks. A review of recent MARL-based literature reveals a counter-intuitive finding: several sophisticated approaches have been outperformed by simpler independent control schemes when applied across multiple intersections. This paper analyzes the phenomenon and proposes a hypothesis that <em>the setting of surveillance zone length may determine whether a MARL-based CTSC algorithm is effective or not.</em> We prove this hypothesis qualitatively and quantitatively and find that the intersection interactions are time-lagged. Faced with the incomplete surveillance zone, we model the CTSC process as a decentralized partially observable Markov decision process (Dec-POMDP). Further, we propose ENB-RL, a MARL model with explicit neighborhood backtracking to handle the lag in impacts from neighbors. The core of our proposal is an ENB module, which consists of a neighborhood backtracking stack to store and update neighborhood intersections’ historical throughput in a segmented weighted way, and a multi-head attention model for spatio-temporal differentiated input. Such explicit and precise inputs can improve the agent’s observations in incomplete perceptual environments. Considering that historical backtracking information may lead to convergence instability, we introduce random Gaussian noise for Double Deep Q-Network (DDQN) to generate uncertainty and improve the efficiency and stability of exploration. Experimental results show that ENB-RL has the best convergence performance on both synthetic and real-world datasets, and outperforms other state-of-the-art MARL models. Ablation experiments confirm the efficacy of each component in the framework. Moreover, the proposed ENB module can also be plugged and played in mainstream RL-based models.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"179 ","pages":"Article 105265"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25002694","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-Agent Reinforcement Learning (MARL) has been empirically demonstrated as a highly promising paradigm for the Cooperative Traffic Signal Control (CTSC) of urban road networks. A review of recent MARL-based literature reveals a counter-intuitive finding: several sophisticated approaches have been outperformed by simpler independent control schemes when applied across multiple intersections. This paper analyzes the phenomenon and proposes a hypothesis that the setting of surveillance zone length may determine whether a MARL-based CTSC algorithm is effective or not. We prove this hypothesis qualitatively and quantitatively and find that the intersection interactions are time-lagged. Faced with the incomplete surveillance zone, we model the CTSC process as a decentralized partially observable Markov decision process (Dec-POMDP). Further, we propose ENB-RL, a MARL model with explicit neighborhood backtracking to handle the lag in impacts from neighbors. The core of our proposal is an ENB module, which consists of a neighborhood backtracking stack to store and update neighborhood intersections’ historical throughput in a segmented weighted way, and a multi-head attention model for spatio-temporal differentiated input. Such explicit and precise inputs can improve the agent’s observations in incomplete perceptual environments. Considering that historical backtracking information may lead to convergence instability, we introduce random Gaussian noise for Double Deep Q-Network (DDQN) to generate uncertainty and improve the efficiency and stability of exploration. Experimental results show that ENB-RL has the best convergence performance on both synthetic and real-world datasets, and outperforms other state-of-the-art MARL models. Ablation experiments confirm the efficacy of each component in the framework. Moreover, the proposed ENB module can also be plugged and played in mainstream RL-based models.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.