Qionghua Liao , Guilong Li , Jiajie Yu , Ziyuan Gu , Wei Ma
{"title":"动态耦合交通-电力系统中电动汽车充电站推荐的在线预测辅助安全强化学习","authors":"Qionghua Liao , Guilong Li , Jiajie Yu , Ziyuan Gu , Wei Ma","doi":"10.1016/j.trc.2025.105155","DOIUrl":null,"url":null,"abstract":"<div><div>With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. However, existing literature largely overlooks the interactions between power grid security and traffic efficiency, where the deterioration of grid security also leads to a decrease in traffic efficiency. In view of this, we study the en-route charging station (CS) recommendation problem for EVs in dynamically coupled transportation-power systems. The system-level objective is to maximize the overall traffic efficiency while enhancing the safety of the power grid. This problem is for the first time formulated as a constrained Markov decision process (CMDP), and an online prediction-assisted safe reinforcement learning (OP-SRL) method is proposed to learn the optimal and secure policy. To be specific, we mainly address two challenges. First, the constrained optimization problem is converted into an equivalent unconstrained optimization problem by applying the Lagrangian method, and then the Proximal Policy Optimization (PPO) method is extended to incorporate the constraint in the sequential decision process through the inclusions of cost critic and Lagrangian multiplier. Second, to account for the uncertain long-time delay between performing charging station recommendation and commencing charging, we put forward an online sequence-to-sequence (Seq2Seq) predictor for state augmentation, offering foresightful information to guide the agent in making forward-thinking decisions. Finally, we conduct comprehensive experimental studies based on the Nguyen-Dupuis network and a large-scale real-world road network, coupled with IEEE 33-bus and IEEE 69-bus distribution systems, respectively. Results demonstrate that the proposed method outperforms baselines in terms of road network efficiency, power grid safety, and EV user satisfaction. The case study on the real-world network also illustrates the applicability in the practical context.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"176 ","pages":"Article 105155"},"PeriodicalIF":7.6000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online prediction-assisted safe reinforcement learning for electric vehicle charging station recommendation in dynamically coupled transportation-power systems\",\"authors\":\"Qionghua Liao , Guilong Li , Jiajie Yu , Ziyuan Gu , Wei Ma\",\"doi\":\"10.1016/j.trc.2025.105155\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. However, existing literature largely overlooks the interactions between power grid security and traffic efficiency, where the deterioration of grid security also leads to a decrease in traffic efficiency. In view of this, we study the en-route charging station (CS) recommendation problem for EVs in dynamically coupled transportation-power systems. The system-level objective is to maximize the overall traffic efficiency while enhancing the safety of the power grid. This problem is for the first time formulated as a constrained Markov decision process (CMDP), and an online prediction-assisted safe reinforcement learning (OP-SRL) method is proposed to learn the optimal and secure policy. To be specific, we mainly address two challenges. First, the constrained optimization problem is converted into an equivalent unconstrained optimization problem by applying the Lagrangian method, and then the Proximal Policy Optimization (PPO) method is extended to incorporate the constraint in the sequential decision process through the inclusions of cost critic and Lagrangian multiplier. Second, to account for the uncertain long-time delay between performing charging station recommendation and commencing charging, we put forward an online sequence-to-sequence (Seq2Seq) predictor for state augmentation, offering foresightful information to guide the agent in making forward-thinking decisions. Finally, we conduct comprehensive experimental studies based on the Nguyen-Dupuis network and a large-scale real-world road network, coupled with IEEE 33-bus and IEEE 69-bus distribution systems, respectively. Results demonstrate that the proposed method outperforms baselines in terms of road network efficiency, power grid safety, and EV user satisfaction. The case study on the real-world network also illustrates the applicability in the practical context.</div></div>\",\"PeriodicalId\":54417,\"journal\":{\"name\":\"Transportation Research Part C-Emerging Technologies\",\"volume\":\"176 \",\"pages\":\"Article 105155\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part C-Emerging Technologies\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968090X25001597\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25001597","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
Online prediction-assisted safe reinforcement learning for electric vehicle charging station recommendation in dynamically coupled transportation-power systems
With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. However, existing literature largely overlooks the interactions between power grid security and traffic efficiency, where the deterioration of grid security also leads to a decrease in traffic efficiency. In view of this, we study the en-route charging station (CS) recommendation problem for EVs in dynamically coupled transportation-power systems. The system-level objective is to maximize the overall traffic efficiency while enhancing the safety of the power grid. This problem is for the first time formulated as a constrained Markov decision process (CMDP), and an online prediction-assisted safe reinforcement learning (OP-SRL) method is proposed to learn the optimal and secure policy. To be specific, we mainly address two challenges. First, the constrained optimization problem is converted into an equivalent unconstrained optimization problem by applying the Lagrangian method, and then the Proximal Policy Optimization (PPO) method is extended to incorporate the constraint in the sequential decision process through the inclusions of cost critic and Lagrangian multiplier. Second, to account for the uncertain long-time delay between performing charging station recommendation and commencing charging, we put forward an online sequence-to-sequence (Seq2Seq) predictor for state augmentation, offering foresightful information to guide the agent in making forward-thinking decisions. Finally, we conduct comprehensive experimental studies based on the Nguyen-Dupuis network and a large-scale real-world road network, coupled with IEEE 33-bus and IEEE 69-bus distribution systems, respectively. Results demonstrate that the proposed method outperforms baselines in terms of road network efficiency, power grid safety, and EV user satisfaction. The case study on the real-world network also illustrates the applicability in the practical context.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.