Engineering A Large-Scale Traffic Signal Control: A Multi-Agent Reinforcement Learning Approach

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) Pub Date : 2021-05-10 DOI:10.1109/INFOCOMWKSHPS51825.2021.9484451

Yue Chen, Changle Li, Wenwei Yue, Hehe Zhang, Guoqiang Mao

{"title":"Engineering A Large-Scale Traffic Signal Control: A Multi-Agent Reinforcement Learning Approach","authors":"Yue Chen, Changle Li, Wenwei Yue, Hehe Zhang, Guoqiang Mao","doi":"10.1109/INFOCOMWKSHPS51825.2021.9484451","DOIUrl":null,"url":null,"abstract":"Reinforcement learning is of vital significance in machine learning and is also a promising approach for traffic signal control in urban road networks with assistance of deep neural networks. However, in a large scale urban network, the centralized reinforcement learning approach is beset with difficulties due to the extremely high dimension of joint action space. The multi-agent reinforcement learning (MARL) approach overcomes the high dimension problem by employing distributed local agents whose action space is much smaller. Even though, MARL approach introduces another issue that multiple agents interact with environment simultaneously causing its instability so that training each agent independently may not converge. This paper presents an actor-critic based decentralized MARL approach to control traffic signal which overcomes the shortcomings of both centralized RL approach and independent MARL approach. In particular, a distributed critic network is designed which overcomes the difficulty to train a large-scale neural network in centralized RL approach. Moreover, a difference reward method is proposed to evaluate the contribution of each agent, which accelerates the convergence of algorithm and makes agents optimize policy in a more accurate direction. The proposed MARL approach is compared against the fully independent approach and the centralized learning approach in a grid network. Simulation results demonstrate its effectiveness in terms of average travel speed, travel delay and queue length over other MARL algorithms.","PeriodicalId":109588,"journal":{"name":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOMWKSHPS51825.2021.9484451","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Reinforcement learning is of vital significance in machine learning and is also a promising approach for traffic signal control in urban road networks with assistance of deep neural networks. However, in a large scale urban network, the centralized reinforcement learning approach is beset with difficulties due to the extremely high dimension of joint action space. The multi-agent reinforcement learning (MARL) approach overcomes the high dimension problem by employing distributed local agents whose action space is much smaller. Even though, MARL approach introduces another issue that multiple agents interact with environment simultaneously causing its instability so that training each agent independently may not converge. This paper presents an actor-critic based decentralized MARL approach to control traffic signal which overcomes the shortcomings of both centralized RL approach and independent MARL approach. In particular, a distributed critic network is designed which overcomes the difficulty to train a large-scale neural network in centralized RL approach. Moreover, a difference reward method is proposed to evaluate the contribution of each agent, which accelerates the convergence of algorithm and makes agents optimize policy in a more accurate direction. The proposed MARL approach is compared against the fully independent approach and the centralized learning approach in a grid network. Simulation results demonstrate its effectiveness in terms of average travel speed, travel delay and queue length over other MARL algorithms.

查看原文本刊更多论文

大规模交通信号控制工程:多智能体强化学习方法

强化学习在机器学习中具有重要意义，也是利用深度神经网络进行城市道路交通信号控制的一种很有前景的方法。然而，在大规模的城市网络中，由于联合动作空间的维数极高，集中式强化学习方法存在困难。多智能体强化学习(MARL)方法通过使用动作空间小得多的分布式局部智能体来克服高维问题。尽管如此，MARL方法引入了另一个问题，即多个智能体同时与环境交互，导致其不稳定，因此独立训练每个智能体可能不会收敛。本文提出了一种基于参与者评价的分散MARL方法来控制交通信号，克服了集中式RL方法和独立MARL方法的不足。特别地，设计了一个分布式的批评网络，克服了集中式强化学习方法中训练大规模神经网络的困难。此外，提出了一种差分奖励的方法来评估每个智能体的贡献，加快了算法的收敛速度，使智能体朝着更准确的方向优化策略。将该方法与网格网络中的完全独立学习方法和集中学习方法进行了比较。仿真结果表明，该算法在平均行进速度、行进延迟和队列长度方面优于其他MARL算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE INFOCOM 2021 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)

自引率

0.00%

发文量