{"title":"Congestion-aware platoon re-sequencing optimization for electric vehicles using deep reinforcement learning","authors":"Chu Peng, Shaopan Guo, Miao Liu, Long Xiao","doi":"10.1016/j.neucom.2026.133040","DOIUrl":null,"url":null,"abstract":"<div><div>With the development of Vehicle-to-Vehicle (V2V) communication, the non-fixed platoon method has become feasible, enabling vehicles to adjust positions dynamically, balance energy use, and improve efficiency. However, existing methods ignore the dynamic nature of traffic conditions. When road space is limited, platoon re-sequencing may become unsafe or even infeasible. To address these challenges, we propose a congestion-aware platoon re-sequencing optimization framework for electric vehicles (EVs) using deep reinforcement learning. The framework consists of two modules: a Traffic Congestion-Aware (TCA) module and a Deep Reinforcement Learning (DRL) module. Specifically, the TCA module predicts traffic congestion categories and incorporates them as constraints in the optimization process, overcoming the limitations of non-fixed platoon methods that neglect the safety and feasibility impacts of traffic congestion on re-sequencing. The DRL module, built on the Trust Region Policy Optimization (TRPO) algorithm, takes the EV State-of-Charge (SoC) and predicted traffic congestion categories as environmental observations. It restricts re-sequencing operations under congested conditions to prevent invalid actions and simultaneously manages the computational complexity that arises with increasing platoon size. Experimental results demonstrate that, compared to existing reinforcement learning methods without congestion constraints, our proposed framework reduces the frequency of platoon re-sequencing by 34.4%. Moreover, it achieves a 23.6% reduction in the final standard deviation of the SoC across all vehicles compared to existing re-sequencing algorithms, indicating that the unbalanced energy consumption of the vehicles has been reduced.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"676 ","pages":"Article 133040"},"PeriodicalIF":6.5000,"publicationDate":"2026-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231226004376","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/13 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
With the development of Vehicle-to-Vehicle (V2V) communication, the non-fixed platoon method has become feasible, enabling vehicles to adjust positions dynamically, balance energy use, and improve efficiency. However, existing methods ignore the dynamic nature of traffic conditions. When road space is limited, platoon re-sequencing may become unsafe or even infeasible. To address these challenges, we propose a congestion-aware platoon re-sequencing optimization framework for electric vehicles (EVs) using deep reinforcement learning. The framework consists of two modules: a Traffic Congestion-Aware (TCA) module and a Deep Reinforcement Learning (DRL) module. Specifically, the TCA module predicts traffic congestion categories and incorporates them as constraints in the optimization process, overcoming the limitations of non-fixed platoon methods that neglect the safety and feasibility impacts of traffic congestion on re-sequencing. The DRL module, built on the Trust Region Policy Optimization (TRPO) algorithm, takes the EV State-of-Charge (SoC) and predicted traffic congestion categories as environmental observations. It restricts re-sequencing operations under congested conditions to prevent invalid actions and simultaneously manages the computational complexity that arises with increasing platoon size. Experimental results demonstrate that, compared to existing reinforcement learning methods without congestion constraints, our proposed framework reduces the frequency of platoon re-sequencing by 34.4%. Moreover, it achieves a 23.6% reduction in the final standard deviation of the SoC across all vehicles compared to existing re-sequencing algorithms, indicating that the unbalanced energy consumption of the vehicles has been reduced.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.