基于Softmax双角色正则化批评的混合动力汽车能量管理深度强化学习

IF 4.8 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Jewaliddin Shaik;Sri Phani Krishna Karri;Anugula Rajamallaiah;Kishore Bingi;Ramani Kannan;Vikas Singh Panwar
{"title":"基于Softmax双角色正则化批评的混合动力汽车能量管理深度强化学习","authors":"Jewaliddin Shaik;Sri Phani Krishna Karri;Anugula Rajamallaiah;Kishore Bingi;Ramani Kannan;Vikas Singh Panwar","doi":"10.1109/OJVT.2026.3660677","DOIUrl":null,"url":null,"abstract":"Enhancing fuel efficiency in hybrid electric vehicles (HEVs) requires energy management strategies (EMSs) that can operate effectively under nonlinear powertrain dynamics and uncertain, time-varying driving conditions. This paper proposes a deep reinforcement learning (DRL)- based EMS using the double actors regularized critics softmax deep deterministic policy gradient (DARC SD3) algorithm, which integrates Boltzmann-softmax value estimation, a dual-actor architecture, and critic regularization to improve learning stability and value-estimation accuracy. Simulation results show that the proposed DARC SD3 achieves faster convergence, improved state-of-charge (SOC) regulation, and reduced value estimation bias compared with DDPG, TD3, and baseline SD3. Under the FTP-75 driving cycle, the proposed EMS attains 94.6% of the dynamic programming (DP) benchmark fuel economy, while reducing engine transients and smoothing battery power flow. Further evaluation on an unseen composite driving cycle confirms that the trained policy maintains consistent fuel economy and SOC control, demonstrating strong generalization capability across diverse driving conditions.","PeriodicalId":34270,"journal":{"name":"IEEE Open Journal of Vehicular Technology","volume":"7 ","pages":"723-736"},"PeriodicalIF":4.8000,"publicationDate":"2026-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11370697","citationCount":"0","resultStr":"{\"title\":\"Deep Reinforcement Learning for Energy Management in Hybrid Electric Vehicles With Softmax Double-Actor Regularized Critics\",\"authors\":\"Jewaliddin Shaik;Sri Phani Krishna Karri;Anugula Rajamallaiah;Kishore Bingi;Ramani Kannan;Vikas Singh Panwar\",\"doi\":\"10.1109/OJVT.2026.3660677\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Enhancing fuel efficiency in hybrid electric vehicles (HEVs) requires energy management strategies (EMSs) that can operate effectively under nonlinear powertrain dynamics and uncertain, time-varying driving conditions. This paper proposes a deep reinforcement learning (DRL)- based EMS using the double actors regularized critics softmax deep deterministic policy gradient (DARC SD3) algorithm, which integrates Boltzmann-softmax value estimation, a dual-actor architecture, and critic regularization to improve learning stability and value-estimation accuracy. Simulation results show that the proposed DARC SD3 achieves faster convergence, improved state-of-charge (SOC) regulation, and reduced value estimation bias compared with DDPG, TD3, and baseline SD3. Under the FTP-75 driving cycle, the proposed EMS attains 94.6% of the dynamic programming (DP) benchmark fuel economy, while reducing engine transients and smoothing battery power flow. Further evaluation on an unseen composite driving cycle confirms that the trained policy maintains consistent fuel economy and SOC control, demonstrating strong generalization capability across diverse driving conditions.\",\"PeriodicalId\":34270,\"journal\":{\"name\":\"IEEE Open Journal of Vehicular Technology\",\"volume\":\"7 \",\"pages\":\"723-736\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2026-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11370697\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of Vehicular Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11370697/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of Vehicular Technology","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11370697/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

提高混合动力汽车(hev)的燃油效率需要能够在非线性动力系统动力学和不确定时变驾驶条件下有效运行的能量管理策略(ems)。本文提出了一种基于深度强化学习(DRL)的EMS,该EMS采用双参与者正则化评论家softmax深度确定性策略梯度(DARC SD3)算法,该算法将Boltzmann-softmax值估计、双参与者架构和评论家正则化相结合,以提高学习稳定性和值估计精度。仿真结果表明,与DDPG、TD3和基线SD3相比,提出的DARC SD3具有更快的收敛速度、更好的荷电状态(SOC)调节能力和更小的值估计偏差。在FTP-75驾驶循环下,所提出的EMS达到了94.6%的动态规划(DP)基准燃油经济性,同时减少了发动机瞬态并平滑了电池功率流。对未知复合驾驶循环的进一步评估证实,经过训练的策略保持了一致的燃油经济性和SOC控制,在不同的驾驶条件下表现出强大的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep Reinforcement Learning for Energy Management in Hybrid Electric Vehicles With Softmax Double-Actor Regularized Critics
Enhancing fuel efficiency in hybrid electric vehicles (HEVs) requires energy management strategies (EMSs) that can operate effectively under nonlinear powertrain dynamics and uncertain, time-varying driving conditions. This paper proposes a deep reinforcement learning (DRL)- based EMS using the double actors regularized critics softmax deep deterministic policy gradient (DARC SD3) algorithm, which integrates Boltzmann-softmax value estimation, a dual-actor architecture, and critic regularization to improve learning stability and value-estimation accuracy. Simulation results show that the proposed DARC SD3 achieves faster convergence, improved state-of-charge (SOC) regulation, and reduced value estimation bias compared with DDPG, TD3, and baseline SD3. Under the FTP-75 driving cycle, the proposed EMS attains 94.6% of the dynamic programming (DP) benchmark fuel economy, while reducing engine transients and smoothing battery power flow. Further evaluation on an unseen composite driving cycle confirms that the trained policy maintains consistent fuel economy and SOC control, demonstrating strong generalization capability across diverse driving conditions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.60
自引率
0.00%
发文量
25
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书