评估端到端自动驾驶架构：模拟环境中的近端策略优化方法

自主智能系统(英文) Pub Date : 2025-07-25 DOI:10.1007/s43684-025-00102-3

Ângelo Morgado, Kaoru Ota, Mianxiong Dong, Nuno Pombo

{"title":"评估端到端自动驾驶架构：模拟环境中的近端策略优化方法","authors":"Ângelo Morgado, Kaoru Ota, Mianxiong Dong, Nuno Pombo","doi":"10.1007/s43684-025-00102-3","DOIUrl":null,"url":null,"abstract":"<div><p>Autonomous driving systems (ADS) are at the forefront of technological innovation, promising enhanced safety, efficiency, and convenience in transportation. This study investigates the potential of end-to-end reinforcement learning (RL) architectures for ADS, specifically focusing on a Go-To-Point task involving lane-keeping and navigation through basic urban environments. The study uses the Proximal Policy Optimization (PPO) algorithm within the CARLA simulation environment. Traditional modular systems, which separate driving tasks into perception, decision-making, and control, provide interpretability and reliability in controlled scenarios but struggle with adaptability to dynamic, real-world conditions. In contrast, end-to-end systems offer a more integrated approach, potentially enhancing flexibility and decision-making cohesion.</p><p>This research introduces CARLA-GymDrive, a novel framework integrating the CARLA simulator with the Gymnasium API, enabling seamless RL experimentation with both discrete and continuous action spaces. Through a two-phase training regimen, the study evaluates the efficacy of PPO in an end-to-end ADS focused on basic tasks like lane-keeping and waypoint navigation. A comparative analysis with modular architectures is also provided. The findings highlight the strengths of PPO in managing continuous control tasks, achieving smoother and more adaptable driving behaviors than value-based algorithms like Deep Q-Networks. However, challenges remain in generalization and computational demands, with end-to-end systems requiring extensive training time.</p><p>While the study underscores the potential of end-to-end architectures, it also identifies limitations in scalability and real-world applicability, suggesting that modular systems may currently be more feasible for practical ADS deployment. Nonetheless, the CARLA-GymDrive framework and the insights gained from PPO-based ADS contribute significantly to the field, laying a foundation for future advancements in AD.</p></div>","PeriodicalId":71187,"journal":{"name":"自主智能系统(英文)","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s43684-025-00102-3.pdf","citationCount":"0","resultStr":"{\"title\":\"Evaluating end-to-end autonomous driving architectures: a proximal policy optimization approach in simulated environments\",\"authors\":\"Ângelo Morgado, Kaoru Ota, Mianxiong Dong, Nuno Pombo\",\"doi\":\"10.1007/s43684-025-00102-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Autonomous driving systems (ADS) are at the forefront of technological innovation, promising enhanced safety, efficiency, and convenience in transportation. This study investigates the potential of end-to-end reinforcement learning (RL) architectures for ADS, specifically focusing on a Go-To-Point task involving lane-keeping and navigation through basic urban environments. The study uses the Proximal Policy Optimization (PPO) algorithm within the CARLA simulation environment. Traditional modular systems, which separate driving tasks into perception, decision-making, and control, provide interpretability and reliability in controlled scenarios but struggle with adaptability to dynamic, real-world conditions. In contrast, end-to-end systems offer a more integrated approach, potentially enhancing flexibility and decision-making cohesion.</p><p>This research introduces CARLA-GymDrive, a novel framework integrating the CARLA simulator with the Gymnasium API, enabling seamless RL experimentation with both discrete and continuous action spaces. Through a two-phase training regimen, the study evaluates the efficacy of PPO in an end-to-end ADS focused on basic tasks like lane-keeping and waypoint navigation. A comparative analysis with modular architectures is also provided. The findings highlight the strengths of PPO in managing continuous control tasks, achieving smoother and more adaptable driving behaviors than value-based algorithms like Deep Q-Networks. However, challenges remain in generalization and computational demands, with end-to-end systems requiring extensive training time.</p><p>While the study underscores the potential of end-to-end architectures, it also identifies limitations in scalability and real-world applicability, suggesting that modular systems may currently be more feasible for practical ADS deployment. Nonetheless, the CARLA-GymDrive framework and the insights gained from PPO-based ADS contribute significantly to the field, laying a foundation for future advancements in AD.</p></div>\",\"PeriodicalId\":71187,\"journal\":{\"name\":\"自主智能系统(英文)\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://link.springer.com/content/pdf/10.1007/s43684-025-00102-3.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"自主智能系统(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s43684-025-00102-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"自主智能系统(英文)","FirstCategoryId":"1093","ListUrlMain":"https://link.springer.com/article/10.1007/s43684-025-00102-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自动驾驶系统（ADS）处于技术创新的前沿，有望提高交通运输的安全性、效率和便利性。本研究探讨了端到端强化学习（RL）架构在ADS中的潜力，特别关注涉及车道保持和在基本城市环境中导航的Go-To-Point任务。该研究在CARLA仿真环境中使用了近端策略优化（PPO）算法。传统的模块化系统将驾驶任务分为感知、决策和控制，在受控场景中提供了可解释性和可靠性，但在适应动态的现实世界条件方面存在困难。相比之下，端到端系统提供了一种更综合的方法，潜在地提高了灵活性和决策凝聚力。本研究介绍了CARLA- gymdrive，这是一个将CARLA模拟器与gym API集成在一起的新框架，可以在离散和连续的动作空间中进行无缝的强化学习实验。通过两阶段的训练方案，该研究评估了PPO在端到端ADS中专注于基本任务（如车道保持和航路点导航）的功效。还提供了与模块化体系结构的比较分析。研究结果强调了PPO在管理连续控制任务方面的优势，与Deep Q-Networks等基于值的算法相比，它可以实现更平稳、更适应性的驾驶行为。然而，在泛化和计算需求方面仍然存在挑战，端到端系统需要大量的训练时间。虽然该研究强调了端到端架构的潜力，但它也指出了可扩展性和实际应用的局限性，表明模块化系统目前可能更适合实际的ADS部署。尽管如此，CARLA-GymDrive框架和从基于ppo的ADS中获得的见解对该领域做出了重大贡献，为AD的未来发展奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating end-to-end autonomous driving architectures: a proximal policy optimization approach in simulated environments

Autonomous driving systems (ADS) are at the forefront of technological innovation, promising enhanced safety, efficiency, and convenience in transportation. This study investigates the potential of end-to-end reinforcement learning (RL) architectures for ADS, specifically focusing on a Go-To-Point task involving lane-keeping and navigation through basic urban environments. The study uses the Proximal Policy Optimization (PPO) algorithm within the CARLA simulation environment. Traditional modular systems, which separate driving tasks into perception, decision-making, and control, provide interpretability and reliability in controlled scenarios but struggle with adaptability to dynamic, real-world conditions. In contrast, end-to-end systems offer a more integrated approach, potentially enhancing flexibility and decision-making cohesion.

This research introduces CARLA-GymDrive, a novel framework integrating the CARLA simulator with the Gymnasium API, enabling seamless RL experimentation with both discrete and continuous action spaces. Through a two-phase training regimen, the study evaluates the efficacy of PPO in an end-to-end ADS focused on basic tasks like lane-keeping and waypoint navigation. A comparative analysis with modular architectures is also provided. The findings highlight the strengths of PPO in managing continuous control tasks, achieving smoother and more adaptable driving behaviors than value-based algorithms like Deep Q-Networks. However, challenges remain in generalization and computational demands, with end-to-end systems requiring extensive training time.

While the study underscores the potential of end-to-end architectures, it also identifies limitations in scalability and real-world applicability, suggesting that modular systems may currently be more feasible for practical ADS deployment. Nonetheless, the CARLA-GymDrive framework and the insights gained from PPO-based ADS contribute significantly to the field, laying a foundation for future advancements in AD.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

自主智能系统(英文)

CiteScore

3.90

自引率

0.00%

发文量