VLM-RL：用于安全自动驾驶的统一视觉语言模型和强化学习框架

IF 7.6 1区工程技术 Q1 TRANSPORTATION SCIENCE & TECHNOLOGY

Transportation Research Part C-Emerging Technologies Pub Date : 2025-08-31 DOI:10.1016/j.trc.2025.105321

Zilin Huang , Zihao Sheng , Yansong Qu , Junwei You , Sikai Chen

{"title":"VLM-RL：用于安全自动驾驶的统一视觉语言模型和强化学习框架","authors":"Zilin Huang , Zihao Sheng , Yansong Qu , Junwei You , Sikai Chen","doi":"10.1016/j.trc.2025.105321","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, reinforcement learning (RL)-based methods for learning driving policies have gained increasing attention in the autonomous driving community and have achieved remarkable progress in various driving scenarios. However, traditional RL approaches rely on manually engineered rewards, which require extensive human effort and often lack generalizability. To address these limitations, we propose <strong>VLM-RL</strong>, a unified framework that integrates pre-trained Vision-Language Models (VLMs) with RL to generate reward signals using image observation and natural language goals. The core of VLM-RL is the contrasting language goal (CLG)-as-reward paradigm, which uses positive and negative language goals to generate semantic rewards. We further introduce a hierarchical reward synthesis approach that combines CLG-based semantic rewards with vehicle state information, improving reward stability and offering a more comprehensive reward signal. Additionally, a batch-processing technique is employed to optimize computational efficiency during training. Extensive experiments in the CARLA simulator demonstrate that VLM-RL outperforms state-of-the-art baselines, achieving a 10.5% reduction in collision rate, a 104.6% increase in route completion rate, and robust generalization to unseen driving scenarios. Furthermore, VLM-RL can seamlessly integrate almost any standard RL algorithms, potentially revolutionizing the existing RL paradigm that relies on manual reward engineering and enabling continuous performance improvements. The demo video and code can be accessed at: <span><span>https://zilin-huang.github.io/VLM-RL-website/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"180 ","pages":"Article 105321"},"PeriodicalIF":7.6000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VLM-RL: A unified vision language models and reinforcement learning framework for safe autonomous driving\",\"authors\":\"Zilin Huang , Zihao Sheng , Yansong Qu , Junwei You , Sikai Chen\",\"doi\":\"10.1016/j.trc.2025.105321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, reinforcement learning (RL)-based methods for learning driving policies have gained increasing attention in the autonomous driving community and have achieved remarkable progress in various driving scenarios. However, traditional RL approaches rely on manually engineered rewards, which require extensive human effort and often lack generalizability. To address these limitations, we propose <strong>VLM-RL</strong>, a unified framework that integrates pre-trained Vision-Language Models (VLMs) with RL to generate reward signals using image observation and natural language goals. The core of VLM-RL is the contrasting language goal (CLG)-as-reward paradigm, which uses positive and negative language goals to generate semantic rewards. We further introduce a hierarchical reward synthesis approach that combines CLG-based semantic rewards with vehicle state information, improving reward stability and offering a more comprehensive reward signal. Additionally, a batch-processing technique is employed to optimize computational efficiency during training. Extensive experiments in the CARLA simulator demonstrate that VLM-RL outperforms state-of-the-art baselines, achieving a 10.5% reduction in collision rate, a 104.6% increase in route completion rate, and robust generalization to unseen driving scenarios. Furthermore, VLM-RL can seamlessly integrate almost any standard RL algorithms, potentially revolutionizing the existing RL paradigm that relies on manual reward engineering and enabling continuous performance improvements. The demo video and code can be accessed at: <span><span>https://zilin-huang.github.io/VLM-RL-website/</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":54417,\"journal\":{\"name\":\"Transportation Research Part C-Emerging Technologies\",\"volume\":\"180 \",\"pages\":\"Article 105321\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part C-Emerging Technologies\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968090X25003250\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25003250","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

近年来，基于强化学习（RL）的驾驶策略学习方法在自动驾驶领域受到越来越多的关注，并在各种驾驶场景中取得了显著进展。然而，传统的强化学习方法依赖于人工设计的奖励，这需要大量的人力投入，而且往往缺乏通用性。为了解决这些限制，我们提出了VLM-RL，这是一个统一的框架，将预训练的视觉语言模型（vlm）与RL集成在一起，利用图像观察和自然语言目标生成奖励信号。VLM-RL的核心是对比语言目标(CLG)-奖励范式，它使用积极和消极的语言目标来产生语义奖励。我们进一步引入了一种分层奖励综合方法，该方法将基于clg的语义奖励与车辆状态信息相结合，提高了奖励稳定性，并提供了更全面的奖励信号。此外，在训练过程中，采用批处理技术优化计算效率。在CARLA模拟器中进行的大量实验表明，VLM-RL优于最先进的基线，碰撞率降低了10.5%，路线完成率提高了104.6%，并且对未知驾驶场景具有强大的泛化能力。此外，VLM-RL可以无缝集成几乎任何标准的RL算法，有可能彻底改变现有依赖于人工奖励工程的RL范式，并实现持续的性能改进。演示视频和代码可以访问：https://zilin-huang.github.io/VLM-RL-website/。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VLM-RL: A unified vision language models and reinforcement learning framework for safe autonomous driving

In recent years, reinforcement learning (RL)-based methods for learning driving policies have gained increasing attention in the autonomous driving community and have achieved remarkable progress in various driving scenarios. However, traditional RL approaches rely on manually engineered rewards, which require extensive human effort and often lack generalizability. To address these limitations, we propose VLM-RL, a unified framework that integrates pre-trained Vision-Language Models (VLMs) with RL to generate reward signals using image observation and natural language goals. The core of VLM-RL is the contrasting language goal (CLG)-as-reward paradigm, which uses positive and negative language goals to generate semantic rewards. We further introduce a hierarchical reward synthesis approach that combines CLG-based semantic rewards with vehicle state information, improving reward stability and offering a more comprehensive reward signal. Additionally, a batch-processing technique is employed to optimize computational efficiency during training. Extensive experiments in the CARLA simulator demonstrate that VLM-RL outperforms state-of-the-art baselines, achieving a 10.5% reduction in collision rate, a 104.6% increase in route completion rate, and robust generalization to unseen driving scenarios. Furthermore, VLM-RL can seamlessly integrate almost any standard RL algorithms, potentially revolutionizing the existing RL paradigm that relies on manual reward engineering and enabling continuous performance improvements. The demo video and code can be accessed at: https://zilin-huang.github.io/VLM-RL-website/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transportation Research Part C-Emerging Technologies 工程技术-运输科技

CiteScore

15.80

自引率

12.00%

发文量

332

审稿时长

64 days

期刊介绍： Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.