Online prediction-assisted safe reinforcement learning for electric vehicle charging station recommendation in dynamically coupled transportation-power systems

IF 7.6 1区工程技术 Q1 TRANSPORTATION SCIENCE & TECHNOLOGY

Transportation Research Part C-Emerging Technologies Pub Date : 2025-05-17 DOI:10.1016/j.trc.2025.105155

Qionghua Liao , Guilong Li , Jiajie Yu , Ziyuan Gu , Wei Ma

{"title":"Online prediction-assisted safe reinforcement learning for electric vehicle charging station recommendation in dynamically coupled transportation-power systems","authors":"Qionghua Liao , Guilong Li , Jiajie Yu , Ziyuan Gu , Wei Ma","doi":"10.1016/j.trc.2025.105155","DOIUrl":null,"url":null,"abstract":"<div><div>With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. However, existing literature largely overlooks the interactions between power grid security and traffic efficiency, where the deterioration of grid security also leads to a decrease in traffic efficiency. In view of this, we study the en-route charging station (CS) recommendation problem for EVs in dynamically coupled transportation-power systems. The system-level objective is to maximize the overall traffic efficiency while enhancing the safety of the power grid. This problem is for the first time formulated as a constrained Markov decision process (CMDP), and an online prediction-assisted safe reinforcement learning (OP-SRL) method is proposed to learn the optimal and secure policy. To be specific, we mainly address two challenges. First, the constrained optimization problem is converted into an equivalent unconstrained optimization problem by applying the Lagrangian method, and then the Proximal Policy Optimization (PPO) method is extended to incorporate the constraint in the sequential decision process through the inclusions of cost critic and Lagrangian multiplier. Second, to account for the uncertain long-time delay between performing charging station recommendation and commencing charging, we put forward an online sequence-to-sequence (Seq2Seq) predictor for state augmentation, offering foresightful information to guide the agent in making forward-thinking decisions. Finally, we conduct comprehensive experimental studies based on the Nguyen-Dupuis network and a large-scale real-world road network, coupled with IEEE 33-bus and IEEE 69-bus distribution systems, respectively. Results demonstrate that the proposed method outperforms baselines in terms of road network efficiency, power grid safety, and EV user satisfaction. The case study on the real-world network also illustrates the applicability in the practical context.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"176 ","pages":"Article 105155"},"PeriodicalIF":7.6000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25001597","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. However, existing literature largely overlooks the interactions between power grid security and traffic efficiency, where the deterioration of grid security also leads to a decrease in traffic efficiency. In view of this, we study the en-route charging station (CS) recommendation problem for EVs in dynamically coupled transportation-power systems. The system-level objective is to maximize the overall traffic efficiency while enhancing the safety of the power grid. This problem is for the first time formulated as a constrained Markov decision process (CMDP), and an online prediction-assisted safe reinforcement learning (OP-SRL) method is proposed to learn the optimal and secure policy. To be specific, we mainly address two challenges. First, the constrained optimization problem is converted into an equivalent unconstrained optimization problem by applying the Lagrangian method, and then the Proximal Policy Optimization (PPO) method is extended to incorporate the constraint in the sequential decision process through the inclusions of cost critic and Lagrangian multiplier. Second, to account for the uncertain long-time delay between performing charging station recommendation and commencing charging, we put forward an online sequence-to-sequence (Seq2Seq) predictor for state augmentation, offering foresightful information to guide the agent in making forward-thinking decisions. Finally, we conduct comprehensive experimental studies based on the Nguyen-Dupuis network and a large-scale real-world road network, coupled with IEEE 33-bus and IEEE 69-bus distribution systems, respectively. Results demonstrate that the proposed method outperforms baselines in terms of road network efficiency, power grid safety, and EV user satisfaction. The case study on the real-world network also illustrates the applicability in the practical context.

查看原文本刊更多论文

动态耦合交通-电力系统中电动汽车充电站推荐的在线预测辅助安全强化学习

随着电动汽车（ev）的普及，交通网络和电网通过充电站变得越来越相互依赖和耦合。充电需求的增长给两个网络都带来了挑战，凸显了充电协调的重要性。然而，现有文献在很大程度上忽略了电网安全与交通效率之间的相互作用，电网安全的恶化也会导致交通效率的下降。鉴于此，本文研究了动态耦合交通-电力系统中电动汽车的途中充电站推荐问题。系统级目标是在提高电网安全性的同时，最大限度地提高整体交通效率。首次将该问题表述为约束马尔可夫决策过程（CMDP），并提出了一种在线预测辅助安全强化学习（OP-SRL）方法来学习最优安全策略。具体来说，我们主要应对两个挑战。首先，应用拉格朗日方法将约束优化问题转化为等效的无约束优化问题，然后将近端策略优化（PPO）方法扩展为通过包含成本批判和拉格朗日乘子将约束纳入序列决策过程。其次，考虑到充电站推荐和开始充电之间不确定的长时间延迟，我们提出了一个在线序列到序列（Seq2Seq）预测器进行状态增强，为智能体做出前瞻性决策提供了前瞻性信息。最后，我们基于Nguyen-Dupuis网络和大型现实世界道路网络，分别结合IEEE 33-bus和IEEE 69-bus配电系统进行了全面的实验研究。结果表明，该方法在路网效率、电网安全性和电动汽车用户满意度方面优于基线方法。对现实网络的案例研究也说明了该方法在实际环境中的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transportation Research Part C-Emerging Technologies 工程技术-运输科技

CiteScore

15.80

自引率

12.00%

发文量

332

审稿时长

64 days

期刊介绍： Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.