基于强化学习的连续急转弯道路自动驾驶端到端决策算法研究

Tongyang Li, Jiageng Ruan, Kaixuan Zhang
{"title":"基于强化学习的连续急转弯道路自动驾驶端到端决策算法研究","authors":"Tongyang Li,&nbsp;Jiageng Ruan,&nbsp;Kaixuan Zhang","doi":"10.1016/j.geits.2025.100288","DOIUrl":null,"url":null,"abstract":"<div><div>Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 ​s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.</div></div>","PeriodicalId":100596,"journal":{"name":"Green Energy and Intelligent Transportation","volume":"4 3","pages":"Article 100288"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns\",\"authors\":\"Tongyang Li,&nbsp;Jiageng Ruan,&nbsp;Kaixuan Zhang\",\"doi\":\"10.1016/j.geits.2025.100288\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 ​s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.</div></div>\",\"PeriodicalId\":100596,\"journal\":{\"name\":\"Green Energy and Intelligent Transportation\",\"volume\":\"4 3\",\"pages\":\"Article 100288\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Green Energy and Intelligent Transportation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2773153725000386\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Green Energy and Intelligent Transportation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773153725000386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

基于学习的算法在自动驾驶控制领域备受关注,特别是在决策方面,以应对长尾极端场景的挑战,传统方法即使付出很大努力也表现出较差的适应性。为了提高极端情况下的自动驾驶性能,特别是连续急转弯,本研究提出了三种深度强化学习算法,即深度确定性策略梯度(deep Deterministic Policy Gradient, DDPG)、双延迟深度确定性策略梯度(Twin Delayed deep Deterministic Policy Gradient, TD3)和基于软行为者评论(Soft Actor-Critic, SAC)的决策策略。通过比较各算法在不同曲率曲线上的行驶稳定性、平均速度和消耗的计算量,讨论了观测变量在智能体训练中的作用。此外,针对奖励引导算法中存在的奖励稀疏问题,提出了一种结合环境状态和车辆状态的奖励设置方法。连续急转弯道路的仿真结果表明,基于DDPG、SAC和TD3算法的车辆完成任务的时间分别为367.2秒、359.6秒和302.1秒,与训练结果吻合,验证了观察变量在智能体质量提升中的作用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns
Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 ​s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.40
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信