Reza Jafari;Pouria Sarhadi;Amin Paykani;Shady S. Refaat;Pedram Asef
{"title":"Integrated Energy Optimization and Stability Control Using Deep Reinforcement Learning for an All-Wheel-Drive Electric Vehicle","authors":"Reza Jafari;Pouria Sarhadi;Amin Paykani;Shady S. Refaat;Pedram Asef","doi":"10.1109/OJVT.2025.3606120","DOIUrl":null,"url":null,"abstract":"This study presents an innovative solution for simultaneous energy optimization and dynamic yaw control of all-wheel-drive (AWD) electric vehicles (EVs) using deep reinforcement learning (DRL) techniques. To this end, three model-free DRL-based methods, based on deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3), and TD3 enhanced with curriculum learning (CL TD3), are developed for determining optimal yaw moment control and energy optimization online. The proposed DRL controllers are benchmarked against model-based controllers, i.e., linear quadratic regulator with the sequential quadratic programming (LSQP) and sliding mode control with SQP (SSQP). A tailored multi-term reward function is structured to penalize excessive yaw rate error, sideslip angle, tire slip deviations beyond peak grip regions, and power losses based on a realistic electric machine efficiency map. The learning environment is based on a nonlinear double-track vehicle model, incorporating tire-road interactions. To evaluate the generalizability of the algorithms, the agents are tested across various velocities, tire–road friction coefficients, and additional scenarios implemented in IPG CarMaker, a high-fidelity vehicle dynamics simulator. In addition to the deployment without requiring an explicit model of the plant, the simulation results demonstrate that the proposed solution modifies vehicle dynamics and maneuverability in most cases compared to the model-based conventional controller. Furthermore, the reduction in sideslip angle, excellent traction through minimizing tire slip ratio, avoiding oversteering and understeering, and maintaining an acceptable range of energy optimization are demonstrated for DRL controllers, especially for the TD3 and CL TD3 algorithms.","PeriodicalId":34270,"journal":{"name":"IEEE Open Journal of Vehicular Technology","volume":"6 ","pages":"2583-2606"},"PeriodicalIF":4.8000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11150741","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of Vehicular Technology","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11150741/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This study presents an innovative solution for simultaneous energy optimization and dynamic yaw control of all-wheel-drive (AWD) electric vehicles (EVs) using deep reinforcement learning (DRL) techniques. To this end, three model-free DRL-based methods, based on deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3), and TD3 enhanced with curriculum learning (CL TD3), are developed for determining optimal yaw moment control and energy optimization online. The proposed DRL controllers are benchmarked against model-based controllers, i.e., linear quadratic regulator with the sequential quadratic programming (LSQP) and sliding mode control with SQP (SSQP). A tailored multi-term reward function is structured to penalize excessive yaw rate error, sideslip angle, tire slip deviations beyond peak grip regions, and power losses based on a realistic electric machine efficiency map. The learning environment is based on a nonlinear double-track vehicle model, incorporating tire-road interactions. To evaluate the generalizability of the algorithms, the agents are tested across various velocities, tire–road friction coefficients, and additional scenarios implemented in IPG CarMaker, a high-fidelity vehicle dynamics simulator. In addition to the deployment without requiring an explicit model of the plant, the simulation results demonstrate that the proposed solution modifies vehicle dynamics and maneuverability in most cases compared to the model-based conventional controller. Furthermore, the reduction in sideslip angle, excellent traction through minimizing tire slip ratio, avoiding oversteering and understeering, and maintaining an acceptable range of energy optimization are demonstrated for DRL controllers, especially for the TD3 and CL TD3 algorithms.