Lin Li , Runjia Tan , Jianwu Fang , Jianru Xue , Chen Lv
{"title":"基于llm增强分层强化学习的类人自动驾驶决策","authors":"Lin Li , Runjia Tan , Jianwu Fang , Jianru Xue , Chen Lv","doi":"10.1016/j.eswa.2025.128736","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement Learning (RL) has shown great promise for autonomous driving decision-making. However, such data-driven methods inherently struggle to be deployed in real-world due to their limited generalization to rare but safety-critical scenarios and low sample efficiency, resulting in high computational costs. To address these challenges, we propose a hierarchical RL framework augmented with a large language model (LLM), to enhance decision-making in complex driving environments through semantic understanding and commonsense knowledge. Inspired by human drivers, the LLM serves as an expert high-level planner that interprets textual descriptions of driving scenarios to generate a long-term goal point, a recommended meta-action, and a corresponding explanation, thereby navigating complex environments effectively. To meet real-time requirements, the high-level LLM module operates at a reduced frequency, balancing reasoning capability and inference latency. At the low level, however, it remains challenging for the RL agent to learn a sequence of continuous short-term actions, acceleration and steering, that can achieve the high-level goal while ensuring safety and efficiency. To bridge this gap, we introduce a Goal Gradient-based Transfer (GGT) mechanism that embeds an explicit gradient toward the LLM-generated goal, facilitating efficient policy learning. Additionally, to align the learned behaviors with human intents, we incorporate a human-in-the-loop reward design process.Specifically, the LLM contributes to reward design by generating structurally diverse functions, which are iteratively optimized using expert preferences over RL-generated trajectory pairs to ensure alignment with human values and safety. Overall, experimental comparisons in the CARLA simulator demonstrate that the proposed framework significantly improves generalization, interpretability, and human alignment in diverse and unseen driving scenarios.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"294 ","pages":"Article 128736"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LLM-augmented hierarchical reinforcement learning for human-like decision-making of autonomous driving\",\"authors\":\"Lin Li , Runjia Tan , Jianwu Fang , Jianru Xue , Chen Lv\",\"doi\":\"10.1016/j.eswa.2025.128736\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement Learning (RL) has shown great promise for autonomous driving decision-making. However, such data-driven methods inherently struggle to be deployed in real-world due to their limited generalization to rare but safety-critical scenarios and low sample efficiency, resulting in high computational costs. To address these challenges, we propose a hierarchical RL framework augmented with a large language model (LLM), to enhance decision-making in complex driving environments through semantic understanding and commonsense knowledge. Inspired by human drivers, the LLM serves as an expert high-level planner that interprets textual descriptions of driving scenarios to generate a long-term goal point, a recommended meta-action, and a corresponding explanation, thereby navigating complex environments effectively. To meet real-time requirements, the high-level LLM module operates at a reduced frequency, balancing reasoning capability and inference latency. At the low level, however, it remains challenging for the RL agent to learn a sequence of continuous short-term actions, acceleration and steering, that can achieve the high-level goal while ensuring safety and efficiency. To bridge this gap, we introduce a Goal Gradient-based Transfer (GGT) mechanism that embeds an explicit gradient toward the LLM-generated goal, facilitating efficient policy learning. Additionally, to align the learned behaviors with human intents, we incorporate a human-in-the-loop reward design process.Specifically, the LLM contributes to reward design by generating structurally diverse functions, which are iteratively optimized using expert preferences over RL-generated trajectory pairs to ensure alignment with human values and safety. Overall, experimental comparisons in the CARLA simulator demonstrate that the proposed framework significantly improves generalization, interpretability, and human alignment in diverse and unseen driving scenarios.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"294 \",\"pages\":\"Article 128736\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417425023541\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425023541","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
LLM-augmented hierarchical reinforcement learning for human-like decision-making of autonomous driving
Reinforcement Learning (RL) has shown great promise for autonomous driving decision-making. However, such data-driven methods inherently struggle to be deployed in real-world due to their limited generalization to rare but safety-critical scenarios and low sample efficiency, resulting in high computational costs. To address these challenges, we propose a hierarchical RL framework augmented with a large language model (LLM), to enhance decision-making in complex driving environments through semantic understanding and commonsense knowledge. Inspired by human drivers, the LLM serves as an expert high-level planner that interprets textual descriptions of driving scenarios to generate a long-term goal point, a recommended meta-action, and a corresponding explanation, thereby navigating complex environments effectively. To meet real-time requirements, the high-level LLM module operates at a reduced frequency, balancing reasoning capability and inference latency. At the low level, however, it remains challenging for the RL agent to learn a sequence of continuous short-term actions, acceleration and steering, that can achieve the high-level goal while ensuring safety and efficiency. To bridge this gap, we introduce a Goal Gradient-based Transfer (GGT) mechanism that embeds an explicit gradient toward the LLM-generated goal, facilitating efficient policy learning. Additionally, to align the learned behaviors with human intents, we incorporate a human-in-the-loop reward design process.Specifically, the LLM contributes to reward design by generating structurally diverse functions, which are iteratively optimized using expert preferences over RL-generated trajectory pairs to ensure alignment with human values and safety. Overall, experimental comparisons in the CARLA simulator demonstrate that the proposed framework significantly improves generalization, interpretability, and human alignment in diverse and unseen driving scenarios.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.