HEV Energy Management Strategy Based on TD3 with Prioritized Exploration and Experience Replay

Yu He, Youngki Kim
{"title":"HEV Energy Management Strategy Based on TD3 with Prioritized Exploration and Experience Replay","authors":"Yu He, Youngki Kim","doi":"10.23919/ACC55779.2023.10156220","DOIUrl":null,"url":null,"abstract":"This paper presents a novel energy management strategy for hybrid electric vehicles (HEVs) that is based on an expert twin-delayed deep deterministic policy gradient with prioritized exploration and experience replay (TD3-PEER). State-of-the-art TD3 requires critic networks to generate predicted Q value for state-action pairs to update a policy network. However, the critic networks may struggle with predicting Q values for certain states when the Q values of these states are sensitive to action selection. To address this issue, this paper proposes a prioritized exploration technique that encourages the agent to visit action-sensitive states more frequently in the application of HEV energy management. The proposed algorithm is tested and validated on a P0+P4 HEV model. To simplify the control design, a motor activation threshold is introduced into the final layer of the agent’s actor. In addition, dynamic programming results are incorporated into the training of the TD3, helping the agent avoid inefficient operations. Simulation results demonstrate that with expert knowledge considered for all learning-based methods, the proposed TD3-PEER outperforms other RL-based energy management strategies, including DDPG-PER and deep Q-network, by an average of 2.3% and 3.74% over the training and validation cycles, respectively.","PeriodicalId":397401,"journal":{"name":"2023 American Control Conference (ACC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 American Control Conference (ACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ACC55779.2023.10156220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper presents a novel energy management strategy for hybrid electric vehicles (HEVs) that is based on an expert twin-delayed deep deterministic policy gradient with prioritized exploration and experience replay (TD3-PEER). State-of-the-art TD3 requires critic networks to generate predicted Q value for state-action pairs to update a policy network. However, the critic networks may struggle with predicting Q values for certain states when the Q values of these states are sensitive to action selection. To address this issue, this paper proposes a prioritized exploration technique that encourages the agent to visit action-sensitive states more frequently in the application of HEV energy management. The proposed algorithm is tested and validated on a P0+P4 HEV model. To simplify the control design, a motor activation threshold is introduced into the final layer of the agent’s actor. In addition, dynamic programming results are incorporated into the training of the TD3, helping the agent avoid inefficient operations. Simulation results demonstrate that with expert knowledge considered for all learning-based methods, the proposed TD3-PEER outperforms other RL-based energy management strategies, including DDPG-PER and deep Q-network, by an average of 2.3% and 3.74% over the training and validation cycles, respectively.
基于TD3优先探索与体验重播的HEV能量管理策略
提出了一种新的混合动力汽车能量管理策略,该策略基于专家双延迟深度确定性策略梯度,具有优先探索和经验重播(TD3-PEER)。最先进的TD3要求批判网络生成状态-动作对的预测Q值,以更新策略网络。然而,当某些状态的Q值对动作选择敏感时,批评网络可能难以预测这些状态的Q值。为了解决这一问题,本文提出了一种优先探索技术,在混合动力汽车能量管理的应用中,鼓励智能体更频繁地访问动作敏感状态。在P0+P4 HEV模型上对该算法进行了测试和验证。为了简化控制设计,在agent的actor的最后一层引入了一个运动激活阈值。此外,动态规划结果被纳入TD3的训练中,帮助agent避免低效操作。仿真结果表明,在所有基于学习的方法中都考虑了专家知识的情况下,所提出的TD3-PEER在训练和验证周期内分别比其他基于rl的能量管理策略(包括DDPG-PER和deep Q-network)平均高出2.3%和3.74%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信