基于llm增强分层强化学习的类人自动驾驶决策

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-06-30 DOI:10.1016/j.eswa.2025.128736

Lin Li , Runjia Tan , Jianwu Fang , Jianru Xue , Chen Lv

{"title":"基于llm增强分层强化学习的类人自动驾驶决策","authors":"Lin Li , Runjia Tan , Jianwu Fang , Jianru Xue , Chen Lv","doi":"10.1016/j.eswa.2025.128736","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement Learning (RL) has shown great promise for autonomous driving decision-making. However, such data-driven methods inherently struggle to be deployed in real-world due to their limited generalization to rare but safety-critical scenarios and low sample efficiency, resulting in high computational costs. To address these challenges, we propose a hierarchical RL framework augmented with a large language model (LLM), to enhance decision-making in complex driving environments through semantic understanding and commonsense knowledge. Inspired by human drivers, the LLM serves as an expert high-level planner that interprets textual descriptions of driving scenarios to generate a long-term goal point, a recommended meta-action, and a corresponding explanation, thereby navigating complex environments effectively. To meet real-time requirements, the high-level LLM module operates at a reduced frequency, balancing reasoning capability and inference latency. At the low level, however, it remains challenging for the RL agent to learn a sequence of continuous short-term actions, acceleration and steering, that can achieve the high-level goal while ensuring safety and efficiency. To bridge this gap, we introduce a Goal Gradient-based Transfer (GGT) mechanism that embeds an explicit gradient toward the LLM-generated goal, facilitating efficient policy learning. Additionally, to align the learned behaviors with human intents, we incorporate a human-in-the-loop reward design process.Specifically, the LLM contributes to reward design by generating structurally diverse functions, which are iteratively optimized using expert preferences over RL-generated trajectory pairs to ensure alignment with human values and safety. Overall, experimental comparisons in the CARLA simulator demonstrate that the proposed framework significantly improves generalization, interpretability, and human alignment in diverse and unseen driving scenarios.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"294 ","pages":"Article 128736"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-augmented hierarchical reinforcement learning for human-like decision-making of autonomous driving\",\"authors\":\"Lin Li , Runjia Tan , Jianwu Fang , Jianru Xue , Chen Lv\",\"doi\":\"10.1016/j.eswa.2025.128736\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement Learning (RL) has shown great promise for autonomous driving decision-making. However, such data-driven methods inherently struggle to be deployed in real-world due to their limited generalization to rare but safety-critical scenarios and low sample efficiency, resulting in high computational costs. To address these challenges, we propose a hierarchical RL framework augmented with a large language model (LLM), to enhance decision-making in complex driving environments through semantic understanding and commonsense knowledge. Inspired by human drivers, the LLM serves as an expert high-level planner that interprets textual descriptions of driving scenarios to generate a long-term goal point, a recommended meta-action, and a corresponding explanation, thereby navigating complex environments effectively. To meet real-time requirements, the high-level LLM module operates at a reduced frequency, balancing reasoning capability and inference latency. At the low level, however, it remains challenging for the RL agent to learn a sequence of continuous short-term actions, acceleration and steering, that can achieve the high-level goal while ensuring safety and efficiency. To bridge this gap, we introduce a Goal Gradient-based Transfer (GGT) mechanism that embeds an explicit gradient toward the LLM-generated goal, facilitating efficient policy learning. Additionally, to align the learned behaviors with human intents, we incorporate a human-in-the-loop reward design process.Specifically, the LLM contributes to reward design by generating structurally diverse functions, which are iteratively optimized using expert preferences over RL-generated trajectory pairs to ensure alignment with human values and safety. Overall, experimental comparisons in the CARLA simulator demonstrate that the proposed framework significantly improves generalization, interpretability, and human alignment in diverse and unseen driving scenarios.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"294 \",\"pages\":\"Article 128736\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425023541\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425023541","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

强化学习（RL）在自动驾驶决策方面显示出巨大的前景。然而，这种数据驱动的方法本身就很难在现实世界中部署，因为它们对罕见但安全关键的场景的泛化有限，而且样本效率低，导致计算成本高。为了解决这些挑战，我们提出了一个带有大型语言模型（LLM）的分层强化学习框架，通过语义理解和常识知识来增强复杂驾驶环境中的决策。受人类驾驶员的启发，LLM作为一个专家级的高级规划师，解释驾驶场景的文本描述，生成长期目标点、推荐的元操作和相应的解释，从而有效地导航复杂的环境。为了满足实时性要求，高级LLM模块以较低的频率运行，平衡了推理能力和推理延迟。然而，在低水平上，RL智能体学习一系列连续的短期动作、加速和转向，在确保安全和效率的同时实现高水平目标，仍然是一个挑战。为了弥补这一差距，我们引入了一种基于目标梯度的转移（GGT）机制，该机制嵌入了指向llm生成的目标的显式梯度，促进了有效的策略学习。此外，为了使学习行为与人类意图保持一致，我们采用了人在循环的奖励设计过程。具体来说，LLM通过生成结构多样化的功能来促进奖励设计，这些功能使用专家偏好而不是rl生成的轨迹对进行迭代优化，以确保与人类价值观和安全性保持一致。总体而言，CARLA模拟器中的实验比较表明，所提出的框架显著提高了各种未知驾驶场景的泛化、可解释性和人类一致性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

LLM-augmented hierarchical reinforcement learning for human-like decision-making of autonomous driving

Reinforcement Learning (RL) has shown great promise for autonomous driving decision-making. However, such data-driven methods inherently struggle to be deployed in real-world due to their limited generalization to rare but safety-critical scenarios and low sample efficiency, resulting in high computational costs. To address these challenges, we propose a hierarchical RL framework augmented with a large language model (LLM), to enhance decision-making in complex driving environments through semantic understanding and commonsense knowledge. Inspired by human drivers, the LLM serves as an expert high-level planner that interprets textual descriptions of driving scenarios to generate a long-term goal point, a recommended meta-action, and a corresponding explanation, thereby navigating complex environments effectively. To meet real-time requirements, the high-level LLM module operates at a reduced frequency, balancing reasoning capability and inference latency. At the low level, however, it remains challenging for the RL agent to learn a sequence of continuous short-term actions, acceleration and steering, that can achieve the high-level goal while ensuring safety and efficiency. To bridge this gap, we introduce a Goal Gradient-based Transfer (GGT) mechanism that embeds an explicit gradient toward the LLM-generated goal, facilitating efficient policy learning. Additionally, to align the learned behaviors with human intents, we incorporate a human-in-the-loop reward design process.Specifically, the LLM contributes to reward design by generating structurally diverse functions, which are iteratively optimized using expert preferences over RL-generated trajectory pairs to ensure alignment with human values and safety. Overall, experimental comparisons in the CARLA simulator demonstrate that the proposed framework significantly improves generalization, interpretability, and human alignment in diverse and unseen driving scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.