合作总是更好吗？带显式邻域回溯的多智能体强化学习在全网交通信号控制中的应用

IF 7.6 1区工程技术 Q1 TRANSPORTATION SCIENCE & TECHNOLOGY

Transportation Research Part C-Emerging Technologies Pub Date : 2025-07-25 DOI:10.1016/j.trc.2025.105265

Yilong Ren , Yizhuo Chang , Zhiyong Cui , Xiao Chang , Haiyang Yu , Xiaosong Li , Yinhai Wang

{"title":"合作总是更好吗？带显式邻域回溯的多智能体强化学习在全网交通信号控制中的应用","authors":"Yilong Ren , Yizhuo Chang , Zhiyong Cui , Xiao Chang , Haiyang Yu , Xiaosong Li , Yinhai Wang","doi":"10.1016/j.trc.2025.105265","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-Agent Reinforcement Learning (MARL) has been empirically demonstrated as a highly promising paradigm for the Cooperative Traffic Signal Control (CTSC) of urban road networks. A review of recent MARL-based literature reveals a counter-intuitive finding: several sophisticated approaches have been outperformed by simpler independent control schemes when applied across multiple intersections. This paper analyzes the phenomenon and proposes a hypothesis that <em>the setting of surveillance zone length may determine whether a MARL-based CTSC algorithm is effective or not.</em> We prove this hypothesis qualitatively and quantitatively and find that the intersection interactions are time-lagged. Faced with the incomplete surveillance zone, we model the CTSC process as a decentralized partially observable Markov decision process (Dec-POMDP). Further, we propose ENB-RL, a MARL model with explicit neighborhood backtracking to handle the lag in impacts from neighbors. The core of our proposal is an ENB module, which consists of a neighborhood backtracking stack to store and update neighborhood intersections’ historical throughput in a segmented weighted way, and a multi-head attention model for spatio-temporal differentiated input. Such explicit and precise inputs can improve the agent’s observations in incomplete perceptual environments. Considering that historical backtracking information may lead to convergence instability, we introduce random Gaussian noise for Double Deep Q-Network (DDQN) to generate uncertainty and improve the efficiency and stability of exploration. Experimental results show that ENB-RL has the best convergence performance on both synthetic and real-world datasets, and outperforms other state-of-the-art MARL models. Ablation experiments confirm the efficacy of each component in the framework. Moreover, the proposed ENB module can also be plugged and played in mainstream RL-based models.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"179 ","pages":"Article 105265"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Is cooperative always better? Multi-Agent Reinforcement Learning with explicit neighborhood backtracking for network-wide traffic signal control\",\"authors\":\"Yilong Ren , Yizhuo Chang , Zhiyong Cui , Xiao Chang , Haiyang Yu , Xiaosong Li , Yinhai Wang\",\"doi\":\"10.1016/j.trc.2025.105265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-Agent Reinforcement Learning (MARL) has been empirically demonstrated as a highly promising paradigm for the Cooperative Traffic Signal Control (CTSC) of urban road networks. A review of recent MARL-based literature reveals a counter-intuitive finding: several sophisticated approaches have been outperformed by simpler independent control schemes when applied across multiple intersections. This paper analyzes the phenomenon and proposes a hypothesis that <em>the setting of surveillance zone length may determine whether a MARL-based CTSC algorithm is effective or not.</em> We prove this hypothesis qualitatively and quantitatively and find that the intersection interactions are time-lagged. Faced with the incomplete surveillance zone, we model the CTSC process as a decentralized partially observable Markov decision process (Dec-POMDP). Further, we propose ENB-RL, a MARL model with explicit neighborhood backtracking to handle the lag in impacts from neighbors. The core of our proposal is an ENB module, which consists of a neighborhood backtracking stack to store and update neighborhood intersections’ historical throughput in a segmented weighted way, and a multi-head attention model for spatio-temporal differentiated input. Such explicit and precise inputs can improve the agent’s observations in incomplete perceptual environments. Considering that historical backtracking information may lead to convergence instability, we introduce random Gaussian noise for Double Deep Q-Network (DDQN) to generate uncertainty and improve the efficiency and stability of exploration. Experimental results show that ENB-RL has the best convergence performance on both synthetic and real-world datasets, and outperforms other state-of-the-art MARL models. Ablation experiments confirm the efficacy of each component in the framework. Moreover, the proposed ENB module can also be plugged and played in mainstream RL-based models.</div></div>\",\"PeriodicalId\":54417,\"journal\":{\"name\":\"Transportation Research Part C-Emerging Technologies\",\"volume\":\"179 \",\"pages\":\"Article 105265\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part C-Emerging Technologies\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968090X25002694\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25002694","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

多智能体强化学习（MARL）已被实证证明是城市道路网络协同交通信号控制（CTSC）的一种极具前景的范式。对最近基于marl的文献的回顾揭示了一个反直觉的发现：当应用于多个交叉路口时，一些复杂的方法被更简单的独立控制方案优于。本文对这一现象进行了分析，并提出了一个假设，即监控区域长度的设置可能决定基于marl的CTSC算法是否有效。我们定性和定量地证明了这一假设，并发现交叉相互作用是时滞的。面对不完全监视区域，我们将CTSC过程建模为分散的部分可观察马尔可夫决策过程（Dec-POMDP）。此外，我们提出了ENB-RL，这是一个带有显式邻域回溯的MARL模型，用于处理邻域影响的滞后。本文提出的核心是一个ENB模块，该模块包括一个以分段加权方式存储和更新邻域交叉口历史吞吐量的邻域回溯堆栈和一个用于时空差异化输入的多头注意模型。这种明确而精确的输入可以提高智能体在不完全感知环境中的观察能力。考虑到历史回溯信息可能导致收敛不稳定，我们在双深q网络（DDQN）中引入随机高斯噪声来产生不确定性，提高了探索的效率和稳定性。实验结果表明，ENB-RL在合成数据集和实际数据集上都具有最佳的收敛性能，并且优于其他最先进的MARL模型。烧蚀实验证实了框架中各组分的有效性。此外，所提出的ENB模块也可以在主流的基于rl的模型中插入和播放。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Is cooperative always better? Multi-Agent Reinforcement Learning with explicit neighborhood backtracking for network-wide traffic signal control

Multi-Agent Reinforcement Learning (MARL) has been empirically demonstrated as a highly promising paradigm for the Cooperative Traffic Signal Control (CTSC) of urban road networks. A review of recent MARL-based literature reveals a counter-intuitive finding: several sophisticated approaches have been outperformed by simpler independent control schemes when applied across multiple intersections. This paper analyzes the phenomenon and proposes a hypothesis that the setting of surveillance zone length may determine whether a MARL-based CTSC algorithm is effective or not. We prove this hypothesis qualitatively and quantitatively and find that the intersection interactions are time-lagged. Faced with the incomplete surveillance zone, we model the CTSC process as a decentralized partially observable Markov decision process (Dec-POMDP). Further, we propose ENB-RL, a MARL model with explicit neighborhood backtracking to handle the lag in impacts from neighbors. The core of our proposal is an ENB module, which consists of a neighborhood backtracking stack to store and update neighborhood intersections’ historical throughput in a segmented weighted way, and a multi-head attention model for spatio-temporal differentiated input. Such explicit and precise inputs can improve the agent’s observations in incomplete perceptual environments. Considering that historical backtracking information may lead to convergence instability, we introduce random Gaussian noise for Double Deep Q-Network (DDQN) to generate uncertainty and improve the efficiency and stability of exploration. Experimental results show that ENB-RL has the best convergence performance on both synthetic and real-world datasets, and outperforms other state-of-the-art MARL models. Ablation experiments confirm the efficacy of each component in the framework. Moreover, the proposed ENB module can also be plugged and played in mainstream RL-based models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transportation Research Part C-Emerging Technologies 工程技术-运输科技

CiteScore

15.80

自引率

12.00%

发文量

332

审稿时长

64 days

期刊介绍： Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.