The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns
{"title":"The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns","authors":"Tongyang Li, Jiageng Ruan, Kaixuan Zhang","doi":"10.1016/j.geits.2025.100288","DOIUrl":null,"url":null,"abstract":"<div><div>Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.</div></div>","PeriodicalId":100596,"journal":{"name":"Green Energy and Intelligent Transportation","volume":"4 3","pages":"Article 100288"},"PeriodicalIF":16.4000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Green Energy and Intelligent Transportation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773153725000386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.