A decision-making of autonomous driving method based on DDPG with pretraining

Jinlin Ma, Mingyu Zhang, Kaiping Ma, Houzhong Zhang, Guoqing Geng
{"title":"A decision-making of autonomous driving method based on DDPG with pretraining","authors":"Jinlin Ma, Mingyu Zhang, Kaiping Ma, Houzhong Zhang, Guoqing Geng","doi":"10.1177/09544070241227303","DOIUrl":null,"url":null,"abstract":"Present the DDPGwP (DDPG with Pretraining) model, grounded in the framework of deep reinforcement learning, designed for autonomous driving decision-making. The model incorporates imitation learning by utilizing expert experience for supervised learning during initial training and weight preservation. A novel loss function is devised, enabling the expert experience to jointly guide the Actor network’s update alongside the Critic network while also participating in the Critic network’s updates. This approach allows imitation learning to dominate the early stages of training, with reinforcement learning taking the lead in later stages. Employing experience replay buffer separation techniques, we categorize and store collected superior, ordinary, and expert experiences. We select sensor inputs from the TORCS (The Open Racing Car Simulator) simulation platform and conduct experimental validation, comparing the results with the original DDPG, A2C, and PPO algorithms. Experimental outcomes reveal that incorporating imitation learning significantly accelerates early-stage training, reduces blind trial-and-error during initial exploration, and enhances algorithm stability and safety. The experience replay buffer separation technique improves sampling efficiency and mitigates algorithm overfitting. In addition to expediting algorithm training rates, our approach enables the simulated vehicle to learn superior strategies, garnering higher reward values. This demonstrates the superior stability, safety, and policy-making capabilities of the proposed algorithm, as well as accelerated network convergence.","PeriodicalId":509770,"journal":{"name":"Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/09544070241227303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Present the DDPGwP (DDPG with Pretraining) model, grounded in the framework of deep reinforcement learning, designed for autonomous driving decision-making. The model incorporates imitation learning by utilizing expert experience for supervised learning during initial training and weight preservation. A novel loss function is devised, enabling the expert experience to jointly guide the Actor network’s update alongside the Critic network while also participating in the Critic network’s updates. This approach allows imitation learning to dominate the early stages of training, with reinforcement learning taking the lead in later stages. Employing experience replay buffer separation techniques, we categorize and store collected superior, ordinary, and expert experiences. We select sensor inputs from the TORCS (The Open Racing Car Simulator) simulation platform and conduct experimental validation, comparing the results with the original DDPG, A2C, and PPO algorithms. Experimental outcomes reveal that incorporating imitation learning significantly accelerates early-stage training, reduces blind trial-and-error during initial exploration, and enhances algorithm stability and safety. The experience replay buffer separation technique improves sampling efficiency and mitigates algorithm overfitting. In addition to expediting algorithm training rates, our approach enables the simulated vehicle to learn superior strategies, garnering higher reward values. This demonstrates the superior stability, safety, and policy-making capabilities of the proposed algorithm, as well as accelerated network convergence.
基于预训练 DDPG 的自动驾驶决策方法
介绍基于深度强化学习框架的 DDPGwP(带预训练的 DDPG)模型,该模型专为自动驾驶决策而设计。该模型在初始训练和权重保持过程中利用专家经验进行监督学习,从而将模仿学习融入其中。该模型设计了一个新颖的损失函数,使专家经验能够与批判网络一起共同指导行动者网络的更新,同时也参与批判网络的更新。这种方法允许模仿学习在训练的早期阶段占据主导地位,而强化学习则在后期阶段占据主导地位。利用经验重放缓冲区分离技术,我们对收集到的高级经验、普通经验和专家经验进行分类和存储。我们从 TORCS(开放式赛车模拟器)模拟平台上选择传感器输入,并进行实验验证,将结果与原始的 DDPG、A2C 和 PPO 算法进行比较。实验结果表明,模仿学习大大加快了早期阶段的训练速度,减少了初始探索过程中的盲目试错,提高了算法的稳定性和安全性。经验重放缓冲区分离技术提高了采样效率,减轻了算法的过度拟合。除了加快算法训练速度外,我们的方法还能让模拟车辆学习到更优越的策略,获得更高的奖励值。这证明了所提出算法的卓越稳定性、安全性和决策能力,以及加速网络收敛的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信