HEV Energy Management Strategy Based on TD3 with Prioritized Exploration and Experience Replay

2023 American Control Conference (ACC) Pub Date : 2023-05-31 DOI:10.23919/ACC55779.2023.10156220

Yu He, Youngki Kim

{"title":"HEV Energy Management Strategy Based on TD3 with Prioritized Exploration and Experience Replay","authors":"Yu He, Youngki Kim","doi":"10.23919/ACC55779.2023.10156220","DOIUrl":null,"url":null,"abstract":"This paper presents a novel energy management strategy for hybrid electric vehicles (HEVs) that is based on an expert twin-delayed deep deterministic policy gradient with prioritized exploration and experience replay (TD3-PEER). State-of-the-art TD3 requires critic networks to generate predicted Q value for state-action pairs to update a policy network. However, the critic networks may struggle with predicting Q values for certain states when the Q values of these states are sensitive to action selection. To address this issue, this paper proposes a prioritized exploration technique that encourages the agent to visit action-sensitive states more frequently in the application of HEV energy management. The proposed algorithm is tested and validated on a P0+P4 HEV model. To simplify the control design, a motor activation threshold is introduced into the final layer of the agent’s actor. In addition, dynamic programming results are incorporated into the training of the TD3, helping the agent avoid inefficient operations. Simulation results demonstrate that with expert knowledge considered for all learning-based methods, the proposed TD3-PEER outperforms other RL-based energy management strategies, including DDPG-PER and deep Q-network, by an average of 2.3% and 3.74% over the training and validation cycles, respectively.","PeriodicalId":397401,"journal":{"name":"2023 American Control Conference (ACC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 American Control Conference (ACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/ACC55779.2023.10156220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents a novel energy management strategy for hybrid electric vehicles (HEVs) that is based on an expert twin-delayed deep deterministic policy gradient with prioritized exploration and experience replay (TD3-PEER). State-of-the-art TD3 requires critic networks to generate predicted Q value for state-action pairs to update a policy network. However, the critic networks may struggle with predicting Q values for certain states when the Q values of these states are sensitive to action selection. To address this issue, this paper proposes a prioritized exploration technique that encourages the agent to visit action-sensitive states more frequently in the application of HEV energy management. The proposed algorithm is tested and validated on a P0+P4 HEV model. To simplify the control design, a motor activation threshold is introduced into the final layer of the agent’s actor. In addition, dynamic programming results are incorporated into the training of the TD3, helping the agent avoid inefficient operations. Simulation results demonstrate that with expert knowledge considered for all learning-based methods, the proposed TD3-PEER outperforms other RL-based energy management strategies, including DDPG-PER and deep Q-network, by an average of 2.3% and 3.74% over the training and validation cycles, respectively.

查看原文本刊更多论文

基于TD3优先探索与体验重播的HEV能量管理策略

提出了一种新的混合动力汽车能量管理策略，该策略基于专家双延迟深度确定性策略梯度，具有优先探索和经验重播(TD3-PEER)。最先进的TD3要求批判网络生成状态-动作对的预测Q值，以更新策略网络。然而，当某些状态的Q值对动作选择敏感时，批评网络可能难以预测这些状态的Q值。为了解决这一问题，本文提出了一种优先探索技术，在混合动力汽车能量管理的应用中，鼓励智能体更频繁地访问动作敏感状态。在P0+P4 HEV模型上对该算法进行了测试和验证。为了简化控制设计，在agent的actor的最后一层引入了一个运动激活阈值。此外，动态规划结果被纳入TD3的训练中，帮助agent避免低效操作。仿真结果表明，在所有基于学习的方法中都考虑了专家知识的情况下，所提出的TD3-PEER在训练和验证周期内分别比其他基于rl的能量管理策略(包括DDPG-PER和deep Q-network)平均高出2.3%和3.74%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 American Control Conference (ACC)

自引率

0.00%

发文量