Integrated Energy Optimization and Stability Control Using Deep Reinforcement Learning for an All-Wheel-Drive Electric Vehicle

IF 4.8 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Open Journal of Vehicular Technology Pub Date : 2025-09-04 DOI:10.1109/OJVT.2025.3606120

Reza Jafari;Pouria Sarhadi;Amin Paykani;Shady S. Refaat;Pedram Asef

{"title":"Integrated Energy Optimization and Stability Control Using Deep Reinforcement Learning for an All-Wheel-Drive Electric Vehicle","authors":"Reza Jafari;Pouria Sarhadi;Amin Paykani;Shady S. Refaat;Pedram Asef","doi":"10.1109/OJVT.2025.3606120","DOIUrl":null,"url":null,"abstract":"This study presents an innovative solution for simultaneous energy optimization and dynamic yaw control of all-wheel-drive (AWD) electric vehicles (EVs) using deep reinforcement learning (DRL) techniques. To this end, three model-free DRL-based methods, based on deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3), and TD3 enhanced with curriculum learning (CL TD3), are developed for determining optimal yaw moment control and energy optimization online. The proposed DRL controllers are benchmarked against model-based controllers, i.e., linear quadratic regulator with the sequential quadratic programming (LSQP) and sliding mode control with SQP (SSQP). A tailored multi-term reward function is structured to penalize excessive yaw rate error, sideslip angle, tire slip deviations beyond peak grip regions, and power losses based on a realistic electric machine efficiency map. The learning environment is based on a nonlinear double-track vehicle model, incorporating tire-road interactions. To evaluate the generalizability of the algorithms, the agents are tested across various velocities, tire–road friction coefficients, and additional scenarios implemented in IPG CarMaker, a high-fidelity vehicle dynamics simulator. In addition to the deployment without requiring an explicit model of the plant, the simulation results demonstrate that the proposed solution modifies vehicle dynamics and maneuverability in most cases compared to the model-based conventional controller. Furthermore, the reduction in sideslip angle, excellent traction through minimizing tire slip ratio, avoiding oversteering and understeering, and maintaining an acceptable range of energy optimization are demonstrated for DRL controllers, especially for the TD3 and CL TD3 algorithms.","PeriodicalId":34270,"journal":{"name":"IEEE Open Journal of Vehicular Technology","volume":"6 ","pages":"2583-2606"},"PeriodicalIF":4.8000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11150741","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of Vehicular Technology","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11150741/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

This study presents an innovative solution for simultaneous energy optimization and dynamic yaw control of all-wheel-drive (AWD) electric vehicles (EVs) using deep reinforcement learning (DRL) techniques. To this end, three model-free DRL-based methods, based on deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3), and TD3 enhanced with curriculum learning (CL TD3), are developed for determining optimal yaw moment control and energy optimization online. The proposed DRL controllers are benchmarked against model-based controllers, i.e., linear quadratic regulator with the sequential quadratic programming (LSQP) and sliding mode control with SQP (SSQP). A tailored multi-term reward function is structured to penalize excessive yaw rate error, sideslip angle, tire slip deviations beyond peak grip regions, and power losses based on a realistic electric machine efficiency map. The learning environment is based on a nonlinear double-track vehicle model, incorporating tire-road interactions. To evaluate the generalizability of the algorithms, the agents are tested across various velocities, tire–road friction coefficients, and additional scenarios implemented in IPG CarMaker, a high-fidelity vehicle dynamics simulator. In addition to the deployment without requiring an explicit model of the plant, the simulation results demonstrate that the proposed solution modifies vehicle dynamics and maneuverability in most cases compared to the model-based conventional controller. Furthermore, the reduction in sideslip angle, excellent traction through minimizing tire slip ratio, avoiding oversteering and understeering, and maintaining an acceptable range of energy optimization are demonstrated for DRL controllers, especially for the TD3 and CL TD3 algorithms.

查看原文本刊更多论文

基于深度强化学习的全轮驱动电动汽车能量优化与稳定性控制

本研究提出了一种利用深度强化学习（DRL）技术同时实现全轮驱动（AWD）电动汽车（ev）能量优化和动态偏航控制的创新解决方案。为此，开发了基于深度确定性策略梯度（DDPG）、双延迟深度确定性策略梯度（TD3）和课程学习增强的TD3 （CL TD3）三种无模型drl方法，用于在线确定最优偏航力矩控制和能量优化。所提出的DRL控制器针对基于模型的控制器进行基准测试，即具有序列二次规划（LSQP）的线性二次调节器和具有SQP （SSQP）的滑模控制。基于真实的电机效率图，构建了定制的多期奖励函数来惩罚过大的偏航率误差、侧滑角、超过峰值抓地力区域的轮胎滑移偏差以及功率损失。学习环境是基于一个非线性双轨车辆模型，包括轮胎和道路的相互作用。为了评估算法的泛化性，在IPG automotive（一个高保真汽车动力学模拟器）中，对代理进行了各种速度、轮胎-道路摩擦系数和其他场景的测试。除了不需要明确的对象模型的部署之外，仿真结果表明，与基于模型的传统控制器相比，所提出的解决方案在大多数情况下修改了车辆的动力学和机动性。此外，DRL控制器（尤其是TD3和CL TD3算法）的侧滑角减小、通过最小化轮胎打滑比获得优异的牵引力、避免转向过度和转向不足，并保持可接受的能量优化范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊