面向公平性的约束多智能体自适应交通信号控制

IF 7.9 1区工程技术 Q1 ENGINEERING, CIVIL

IEEE Transactions on Intelligent Transportation Systems Pub Date : 2025-02-27 DOI:10.1109/TITS.2025.3544223

Wanting Liu;Chengwei Zhang;Wanqing Fang;Kailing Zhou;Yihong Li;Furui Zhan;Qi Wang;Wanli Xue;Rong Chen

{"title":"面向公平性的约束多智能体自适应交通信号控制","authors":"Wanting Liu;Chengwei Zhang;Wanqing Fang;Kailing Zhou;Yihong Li;Furui Zhan;Qi Wang;Wanli Xue;Rong Chen","doi":"10.1109/TITS.2025.3544223","DOIUrl":null,"url":null,"abstract":"Multi-agent Reinforcement Learning (MARL) has shown considerable promise in enhancing the efficiency of adaptive traffic signal control (ATSC) systems. However, existing MARL approaches primarily focus on optimizing overall traffic flow, often overlooking the issue of fairness in vehicle waiting times. Considering that there is no need to strive for the ultimate fairness, this paper models the ATSC problem as a Constrained Partially Observable Markov Game (CPOMG), where fairness is modeled as a constraint on the maximum waiting time of vehicles on lanes of intersections instead of a reward term that pursues maximization. CPOMG aims to find a cooperative control policy with optimal traffic efficiency within the constrained solution space by multiple agents. On this basis, this paper proposes a new centralized training and decentralized execution cooperative MARL method, i.e., vehicle-level fairness multi-agent proximity policy optimization (VF-MAPPO). VF-MAPPO leverages a centralized trained global Critic Network to estimate the average vehicle traffic efficiency and vehicle maximum waiting time, and an Actor Network shared by all intersections for decentralized execution, which converts the optimization problem with constraints to an unconstrained optimization objective through the Lagrange multiplier method and adopts proximity policy optimization during training. Additionally, VF-MAPPO incorporates spatial-temporal graph attention in the Critic network to efficiently extract state representations in multi-intersection environments. We qualitatively analyzed the monotonic improvement guarantee of VF-MAPPO. Extensive experimental validation across two real-world and one synthetic scenarios substantiates that VF-MAPPO enhances vehicle-level fairness and maintains average traffic efficiency, surpassing state-of-the-art methods.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 4","pages":"4878-4890"},"PeriodicalIF":7.9000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Vehicle-Level Fairness-Oriented Constrained Multi-Agent Reinforcement Learning for Adaptive Traffic Signal Control\",\"authors\":\"Wanting Liu;Chengwei Zhang;Wanqing Fang;Kailing Zhou;Yihong Li;Furui Zhan;Qi Wang;Wanli Xue;Rong Chen\",\"doi\":\"10.1109/TITS.2025.3544223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-agent Reinforcement Learning (MARL) has shown considerable promise in enhancing the efficiency of adaptive traffic signal control (ATSC) systems. However, existing MARL approaches primarily focus on optimizing overall traffic flow, often overlooking the issue of fairness in vehicle waiting times. Considering that there is no need to strive for the ultimate fairness, this paper models the ATSC problem as a Constrained Partially Observable Markov Game (CPOMG), where fairness is modeled as a constraint on the maximum waiting time of vehicles on lanes of intersections instead of a reward term that pursues maximization. CPOMG aims to find a cooperative control policy with optimal traffic efficiency within the constrained solution space by multiple agents. On this basis, this paper proposes a new centralized training and decentralized execution cooperative MARL method, i.e., vehicle-level fairness multi-agent proximity policy optimization (VF-MAPPO). VF-MAPPO leverages a centralized trained global Critic Network to estimate the average vehicle traffic efficiency and vehicle maximum waiting time, and an Actor Network shared by all intersections for decentralized execution, which converts the optimization problem with constraints to an unconstrained optimization objective through the Lagrange multiplier method and adopts proximity policy optimization during training. Additionally, VF-MAPPO incorporates spatial-temporal graph attention in the Critic network to efficiently extract state representations in multi-intersection environments. We qualitatively analyzed the monotonic improvement guarantee of VF-MAPPO. Extensive experimental validation across two real-world and one synthetic scenarios substantiates that VF-MAPPO enhances vehicle-level fairness and maintains average traffic efficiency, surpassing state-of-the-art methods.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 4\",\"pages\":\"4878-4890\"},\"PeriodicalIF\":7.9000,\"publicationDate\":\"2025-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10907776/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10907776/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

摘要

多智能体强化学习（MARL）在提高自适应交通信号控制（ATSC）系统的效率方面显示出相当大的前景。然而，现有的MARL方法主要侧重于优化整体交通流，往往忽略了车辆等待时间的公平性问题。考虑到不需要追求最终的公平性，本文将ATSC问题建模为约束部分可观察马尔可夫博弈（CPOMG），其中公平性被建模为交叉口车道上车辆最大等待时间的约束，而不是追求最大化的奖励项。CPOMG的目标是在受限的解空间内，由多个智能体寻找具有最优交通效率的合作控制策略。在此基础上，本文提出了一种新的集中训练和分散执行的协同MARL方法，即车辆级公平性多智能体邻近策略优化（VF-MAPPO）。VF-MAPPO利用集中训练的全局Critic网络估计车辆平均交通效率和车辆最大等待时间，利用所有交叉口共享的Actor网络分散执行，通过拉格朗日乘数法将有约束的优化问题转化为无约束的优化目标，并在训练过程中采用邻近策略优化。此外，VF-MAPPO在Critic网络中引入了时空图注意，以有效地提取多路口环境中的状态表示。定性分析了VF-MAPPO的单调性改进保证。在两个真实场景和一个合成场景中进行的广泛实验验证证实，VF-MAPPO提高了车辆级公平性，并保持了平均交通效率，超过了最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Vehicle-Level Fairness-Oriented Constrained Multi-Agent Reinforcement Learning for Adaptive Traffic Signal Control

Multi-agent Reinforcement Learning (MARL) has shown considerable promise in enhancing the efficiency of adaptive traffic signal control (ATSC) systems. However, existing MARL approaches primarily focus on optimizing overall traffic flow, often overlooking the issue of fairness in vehicle waiting times. Considering that there is no need to strive for the ultimate fairness, this paper models the ATSC problem as a Constrained Partially Observable Markov Game (CPOMG), where fairness is modeled as a constraint on the maximum waiting time of vehicles on lanes of intersections instead of a reward term that pursues maximization. CPOMG aims to find a cooperative control policy with optimal traffic efficiency within the constrained solution space by multiple agents. On this basis, this paper proposes a new centralized training and decentralized execution cooperative MARL method, i.e., vehicle-level fairness multi-agent proximity policy optimization (VF-MAPPO). VF-MAPPO leverages a centralized trained global Critic Network to estimate the average vehicle traffic efficiency and vehicle maximum waiting time, and an Actor Network shared by all intersections for decentralized execution, which converts the optimization problem with constraints to an unconstrained optimization objective through the Lagrange multiplier method and adopts proximity policy optimization during training. Additionally, VF-MAPPO incorporates spatial-temporal graph attention in the Critic network to efficiently extract state representations in multi-intersection environments. We qualitatively analyzed the monotonic improvement guarantee of VF-MAPPO. Extensive experimental validation across two real-world and one synthetic scenarios substantiates that VF-MAPPO enhances vehicle-level fairness and maintains average traffic efficiency, surpassing state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.