{"title":"基于强化学习的连续急转弯道路自动驾驶端到端决策算法研究","authors":"Tongyang Li, Jiageng Ruan, Kaixuan Zhang","doi":"10.1016/j.geits.2025.100288","DOIUrl":null,"url":null,"abstract":"<div><div>Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.</div></div>","PeriodicalId":100596,"journal":{"name":"Green Energy and Intelligent Transportation","volume":"4 3","pages":"Article 100288"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns\",\"authors\":\"Tongyang Li, Jiageng Ruan, Kaixuan Zhang\",\"doi\":\"10.1016/j.geits.2025.100288\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.</div></div>\",\"PeriodicalId\":100596,\"journal\":{\"name\":\"Green Energy and Intelligent Transportation\",\"volume\":\"4 3\",\"pages\":\"Article 100288\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Green Energy and Intelligent Transportation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2773153725000386\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Green Energy and Intelligent Transportation","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2773153725000386","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns
Learning-based algorithm attracts great attention in the autonomous driving control field, especially for decision-making, to meet the challenge in long-tail extreme scenarios, where traditional methods demonstrate poor adaptability even with a significant effort. To improve the autonomous driving performance in extreme scenarios, specifically consecutive sharp turns, three deep reinforcement learning algorithms, i.e. Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic policy gradient (TD3), and Soft Actor-Critic (SAC), based decision-making policies are proposed in this study. The role of the observation variable in agent training is discussed by comparing the driving stability, average speed, and consumed computational effort of the proposed algorithms in curves with various curvatures. In addition, a novel reward-setting method that combines the states of the environment and the vehicle is proposed to solve the sparse reward problem in the reward-guided algorithm. Simulation results from the road with consecutive sharp turns show that the DDPG, SAC, and TD3 algorithms-based vehicles take 367.2, 359.6, and 302.1 s to finish the task, respectively, which match the training results, and verifies the observation variable role in agent quality improvement.