The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns

IF 16.4

Green Energy and Intelligent Transportation Pub Date : 2025-06-01 DOI:10.1016/j.geits.2025.100288

Tongyang Li, Jiageng Ruan, Kaixuan Zhang

{"title":"The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns","authors":"Tongyang Li, Jiageng Ruan, Kaixuan Zhang","doi":"10.1016/j.geits.2025.100288","DOIUrl":null,"url":null,"abstract":"<div><div>Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.</div></div>","PeriodicalId":100596,"journal":{"name":"Green Energy and Intelligent Transportation","volume":"4 3","pages":"Article 100288"},"PeriodicalIF":16.4000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Green Energy and Intelligent Transportation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773153725000386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.

Abstract Image

查看原文本刊更多论文

基于强化学习的连续急转弯道路自动驾驶端到端决策算法研究

基于学习的算法在自动驾驶控制领域备受关注，特别是在决策方面，以应对长尾极端场景的挑战，传统方法即使付出很大努力也表现出较差的适应性。为了提高极端情况下的自动驾驶性能，特别是连续急转弯，本研究提出了三种深度强化学习算法，即深度确定性策略梯度（deep Deterministic Policy Gradient， DDPG）、双延迟深度确定性策略梯度（Twin Delayed deep Deterministic Policy Gradient, TD3）和基于软行为者评论（Soft Actor-Critic， SAC）的决策策略。通过比较各算法在不同曲率曲线上的行驶稳定性、平均速度和消耗的计算量，讨论了观测变量在智能体训练中的作用。此外，针对奖励引导算法中存在的奖励稀疏问题，提出了一种结合环境状态和车辆状态的奖励设置方法。连续急转弯道路的仿真结果表明，基于DDPG、SAC和TD3算法的车辆完成任务的时间分别为367.2秒、359.6秒和302.1秒，与训练结果吻合，验证了观察变量在智能体质量提升中的作用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Green Energy and Intelligent Transportation

CiteScore

6.40

自引率

0.00%

发文量