{"title":"自动驾驶事故响应的生成策略驱动HAC强化学习","authors":"Hongtao Zhang , Jin-Qiang Wang , Shengjie Zhang , Yuanbo Jiang , Mengling Li , Binbin Yong , Qingguo Zhou , Xiaokang Zhou","doi":"10.1016/j.future.2025.108106","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning (RL) has become a pivotal approach in autonomous driving decision problems owing to its superior decision optimization capabilities. Existing discrete-time RL frameworks based on Markov decision process modeling face significant challenges in incident response control processes. These approaches lead to high collision rates during low-frequency decision-making and severe action oscillations during high-frequency decision-making. The fundamental limitation is that discrete-time RL methods cannot adapt to real driving scenarios where vehicle decisions rely on continuous-time dynamic system modeling. To address this, in this paper, we propose a generative policy-driven Hamilton-Jacobi-Bellman Actor-Critic (HAC) RL framework, which leverages the Actor to generate action policies and extends continuous-time Hamilton-Jacobi-Bellman capabilities to discrete-time Actor-Critic frameworks through Lipschitz constraints on vehicle control actions. Specifically, the HAC framework integrates deep deterministic policy gradient (DDPG) to implement the HJ-DDPG that incorporates two optimization approaches including delayed policy network updates and dynamic parameter space noise to enhance policy evaluation accuracy and exploration capability. Experimental results demonstrate that vehicles trained using the proposed method achieved 52 % lower average jerk and 48 % reduced steering rates compared to baseline method (Proximal Policy Optimization, PPO) under high-speed conditions, resulting in smoother and safer lane-changing maneuvers.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108106"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generative policy-driven HAC reinforcement learning for autonomous driving incident response\",\"authors\":\"Hongtao Zhang , Jin-Qiang Wang , Shengjie Zhang , Yuanbo Jiang , Mengling Li , Binbin Yong , Qingguo Zhou , Xiaokang Zhou\",\"doi\":\"10.1016/j.future.2025.108106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement learning (RL) has become a pivotal approach in autonomous driving decision problems owing to its superior decision optimization capabilities. Existing discrete-time RL frameworks based on Markov decision process modeling face significant challenges in incident response control processes. These approaches lead to high collision rates during low-frequency decision-making and severe action oscillations during high-frequency decision-making. The fundamental limitation is that discrete-time RL methods cannot adapt to real driving scenarios where vehicle decisions rely on continuous-time dynamic system modeling. To address this, in this paper, we propose a generative policy-driven Hamilton-Jacobi-Bellman Actor-Critic (HAC) RL framework, which leverages the Actor to generate action policies and extends continuous-time Hamilton-Jacobi-Bellman capabilities to discrete-time Actor-Critic frameworks through Lipschitz constraints on vehicle control actions. Specifically, the HAC framework integrates deep deterministic policy gradient (DDPG) to implement the HJ-DDPG that incorporates two optimization approaches including delayed policy network updates and dynamic parameter space noise to enhance policy evaluation accuracy and exploration capability. Experimental results demonstrate that vehicles trained using the proposed method achieved 52 % lower average jerk and 48 % reduced steering rates compared to baseline method (Proximal Policy Optimization, PPO) under high-speed conditions, resulting in smoother and safer lane-changing maneuvers.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"175 \",\"pages\":\"Article 108106\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25004005\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004005","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Generative policy-driven HAC reinforcement learning for autonomous driving incident response
Reinforcement learning (RL) has become a pivotal approach in autonomous driving decision problems owing to its superior decision optimization capabilities. Existing discrete-time RL frameworks based on Markov decision process modeling face significant challenges in incident response control processes. These approaches lead to high collision rates during low-frequency decision-making and severe action oscillations during high-frequency decision-making. The fundamental limitation is that discrete-time RL methods cannot adapt to real driving scenarios where vehicle decisions rely on continuous-time dynamic system modeling. To address this, in this paper, we propose a generative policy-driven Hamilton-Jacobi-Bellman Actor-Critic (HAC) RL framework, which leverages the Actor to generate action policies and extends continuous-time Hamilton-Jacobi-Bellman capabilities to discrete-time Actor-Critic frameworks through Lipschitz constraints on vehicle control actions. Specifically, the HAC framework integrates deep deterministic policy gradient (DDPG) to implement the HJ-DDPG that incorporates two optimization approaches including delayed policy network updates and dynamic parameter space noise to enhance policy evaluation accuracy and exploration capability. Experimental results demonstrate that vehicles trained using the proposed method achieved 52 % lower average jerk and 48 % reduced steering rates compared to baseline method (Proximal Policy Optimization, PPO) under high-speed conditions, resulting in smoother and safer lane-changing maneuvers.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.