A heterogeneous agent reinforcement learning approach with curriculum learning for variable speed limit control

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2025-10-09 DOI:10.1016/j.eswa.2025.129945

Zhaoqing Li , Silai Chen , Guosheng Xiao , Yangsheng Jiang , Zhihong Yao , Puxin Yang

{"title":"A heterogeneous agent reinforcement learning approach with curriculum learning for variable speed limit control","authors":"Zhaoqing Li , Silai Chen , Guosheng Xiao , Yangsheng Jiang , Zhihong Yao , Puxin Yang","doi":"10.1016/j.eswa.2025.129945","DOIUrl":null,"url":null,"abstract":"<div><div>The majority of existing Variable Speed Limit (VSL) studies employ rule-based control strategies, which often fail to accommodate diverse road characteristics and lack responsiveness to sudden congestion. These limitations are further compounded by computational inefficiencies, hindering real-time decision-making in large-scale or complex traffic environments. To address these issues, this study proposes a VSL strategy based on Heterogeneous Agent Reinforcement Learning with Curriculum Learning (HARLCL). Specifically, a top-level agent classifies road segments using the Mini-Batch K-means algorithm, thereby capturing the heterogeneity of each segment. The lower-level agents operate with class-specific observation spaces and action spaces, each agent observes and acts its feature set tailored to its segment class while sharing the same reward design and learning architecture. Subsequently, lower-level agents adaptively adjust both the position and length of each controlled segment according to classification results and real-time traffic conditions. Through a reward-driven mechanism, these agents continually refine their control precision. Moreover, curriculum learning is introduced during the multi-agent training process, effectively accelerating convergence and mitigating computational burdens typically encountered in large-scale reinforcement learning. Experiments were conducted in three task scenarios–typical fixed bottlenecks, random task bottlenecks, and multiple bottlenecks using realistic road data. The results indicate that training time decreased by 43.18 % and 47.35 %; in the SUMO microscopic simulation, total travel time was 312.25 veh·h, 43.05 % lower than NoVSL, and safe braking distances also decreased, indicating improved safety and more stable traffic flow. Compared with conventional feedback control and deep reinforcement learning approaches, the HARLCL-based strategy demonstrates substantial advantages in training efficiency and control precision, offering a promising avenue for practical VSL implementation.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"299 ","pages":"Article 129945"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425035602","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The majority of existing Variable Speed Limit (VSL) studies employ rule-based control strategies, which often fail to accommodate diverse road characteristics and lack responsiveness to sudden congestion. These limitations are further compounded by computational inefficiencies, hindering real-time decision-making in large-scale or complex traffic environments. To address these issues, this study proposes a VSL strategy based on Heterogeneous Agent Reinforcement Learning with Curriculum Learning (HARLCL). Specifically, a top-level agent classifies road segments using the Mini-Batch K-means algorithm, thereby capturing the heterogeneity of each segment. The lower-level agents operate with class-specific observation spaces and action spaces, each agent observes and acts its feature set tailored to its segment class while sharing the same reward design and learning architecture. Subsequently, lower-level agents adaptively adjust both the position and length of each controlled segment according to classification results and real-time traffic conditions. Through a reward-driven mechanism, these agents continually refine their control precision. Moreover, curriculum learning is introduced during the multi-agent training process, effectively accelerating convergence and mitigating computational burdens typically encountered in large-scale reinforcement learning. Experiments were conducted in three task scenarios–typical fixed bottlenecks, random task bottlenecks, and multiple bottlenecks using realistic road data. The results indicate that training time decreased by 43.18 % and 47.35 %; in the SUMO microscopic simulation, total travel time was 312.25 veh·h, 43.05 % lower than NoVSL, and safe braking distances also decreased, indicating improved safety and more stable traffic flow. Compared with conventional feedback control and deep reinforcement learning approaches, the HARLCL-based strategy demonstrates substantial advantages in training efficiency and control precision, offering a promising avenue for practical VSL implementation.

查看原文本刊更多论文

基于课程学习的异构智能体强化学习变限速控制方法

现有的变速限制（VSL）研究大多采用基于规则的控制策略，这些策略往往不能适应不同的道路特征，并且缺乏对突发拥堵的响应能力。计算效率低下进一步加剧了这些限制，阻碍了大规模或复杂交通环境中的实时决策。为了解决这些问题，本研究提出了一种基于异构智能体强化学习与课程学习（HARLCL）的VSL策略。具体来说，顶层智能体使用Mini-Batch K-means算法对道路段进行分类，从而捕获每个路段的异质性。低级智能体使用特定类别的观察空间和动作空间，每个智能体观察并操作其针对其细分类别量身定制的特征集，同时共享相同的奖励设计和学习架构。随后，下层智能体根据分类结果和实时交通状况，自适应调整每个控制路段的位置和长度。通过奖励驱动机制，这些智能体不断完善其控制精度。此外，在多智能体训练过程中引入了课程学习，有效地加速了收敛并减轻了大规模强化学习中通常遇到的计算负担。实验采用现实道路数据，在典型的固定瓶颈、随机任务瓶颈和多重瓶颈三种任务场景下进行。结果表明：训练时间分别减少43.18%和47.35%；在SUMO微观模拟中，总行驶时间为312.25 veh·h，比NoVSL降低43.05%，安全制动距离也减小，表明安全性提高，交通流更加稳定。与传统的反馈控制和深度强化学习方法相比，基于harlcl的策略在训练效率和控制精度方面具有显著优势，为VSL的实际实施提供了一条有前景的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.