{"title":"Vehicle-Level Fairness-Oriented Constrained Multi-Agent Reinforcement Learning for Adaptive Traffic Signal Control","authors":"Wanting Liu;Chengwei Zhang;Wanqing Fang;Kailing Zhou;Yihong Li;Furui Zhan;Qi Wang;Wanli Xue;Rong Chen","doi":"10.1109/TITS.2025.3544223","DOIUrl":null,"url":null,"abstract":"Multi-agent Reinforcement Learning (MARL) has shown considerable promise in enhancing the efficiency of adaptive traffic signal control (ATSC) systems. However, existing MARL approaches primarily focus on optimizing overall traffic flow, often overlooking the issue of fairness in vehicle waiting times. Considering that there is no need to strive for the ultimate fairness, this paper models the ATSC problem as a Constrained Partially Observable Markov Game (CPOMG), where fairness is modeled as a constraint on the maximum waiting time of vehicles on lanes of intersections instead of a reward term that pursues maximization. CPOMG aims to find a cooperative control policy with optimal traffic efficiency within the constrained solution space by multiple agents. On this basis, this paper proposes a new centralized training and decentralized execution cooperative MARL method, i.e., vehicle-level fairness multi-agent proximity policy optimization (VF-MAPPO). VF-MAPPO leverages a centralized trained global Critic Network to estimate the average vehicle traffic efficiency and vehicle maximum waiting time, and an Actor Network shared by all intersections for decentralized execution, which converts the optimization problem with constraints to an unconstrained optimization objective through the Lagrange multiplier method and adopts proximity policy optimization during training. Additionally, VF-MAPPO incorporates spatial-temporal graph attention in the Critic network to efficiently extract state representations in multi-intersection environments. We qualitatively analyzed the monotonic improvement guarantee of VF-MAPPO. Extensive experimental validation across two real-world and one synthetic scenarios substantiates that VF-MAPPO enhances vehicle-level fairness and maintains average traffic efficiency, surpassing state-of-the-art methods.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 4","pages":"4878-4890"},"PeriodicalIF":7.9000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10907776/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-agent Reinforcement Learning (MARL) has shown considerable promise in enhancing the efficiency of adaptive traffic signal control (ATSC) systems. However, existing MARL approaches primarily focus on optimizing overall traffic flow, often overlooking the issue of fairness in vehicle waiting times. Considering that there is no need to strive for the ultimate fairness, this paper models the ATSC problem as a Constrained Partially Observable Markov Game (CPOMG), where fairness is modeled as a constraint on the maximum waiting time of vehicles on lanes of intersections instead of a reward term that pursues maximization. CPOMG aims to find a cooperative control policy with optimal traffic efficiency within the constrained solution space by multiple agents. On this basis, this paper proposes a new centralized training and decentralized execution cooperative MARL method, i.e., vehicle-level fairness multi-agent proximity policy optimization (VF-MAPPO). VF-MAPPO leverages a centralized trained global Critic Network to estimate the average vehicle traffic efficiency and vehicle maximum waiting time, and an Actor Network shared by all intersections for decentralized execution, which converts the optimization problem with constraints to an unconstrained optimization objective through the Lagrange multiplier method and adopts proximity policy optimization during training. Additionally, VF-MAPPO incorporates spatial-temporal graph attention in the Critic network to efficiently extract state representations in multi-intersection environments. We qualitatively analyzed the monotonic improvement guarantee of VF-MAPPO. Extensive experimental validation across two real-world and one synthetic scenarios substantiates that VF-MAPPO enhances vehicle-level fairness and maintains average traffic efficiency, surpassing state-of-the-art methods.
期刊介绍:
The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.