Large language model-enhanced reinforcement learning for generic bus holding control strategies

IF 8.3 1区工程技术 Q1 ECONOMICS

Transportation Research Part E-Logistics and Transportation Review Pub Date : 2025-05-26 DOI:10.1016/j.tre.2025.104142

Jiajie Yu, Yuhong Wang, Wei Ma

{"title":"Large language model-enhanced reinforcement learning for generic bus holding control strategies","authors":"Jiajie Yu, Yuhong Wang, Wei Ma","doi":"10.1016/j.tre.2025.104142","DOIUrl":null,"url":null,"abstract":"<div><div>Bus holding control is a widely-adopted strategy for maintaining stability and improving the operational efficiency of bus systems. Traditional model-based methods often face challenges with the low accuracy of bus state prediction and passenger demand estimation. In contrast, Reinforcement Learning (RL), as a data-driven approach, has demonstrated great potential in formulating bus holding strategies. RL determines the optimal control strategies in order to maximize the cumulative reward, which reflects the overall control goals. However, translating sparse and delayed control goals in real-world tasks into dense and real-time rewards for RL is challenging, normally requiring extensive manual trial-and-error. In view of this, this study introduces an automatic reward generation paradigm by leveraging the in-context learning and reasoning capabilities of Large Language Models (LLMs). This new paradigm, termed the LLM-enhanced RL, comprises several LLM-based modules: reward initializer, reward modifier, performance analyzer, and reward refiner. These modules cooperate to initialize and iteratively improve the reward function according to the feedback from training and test results for the specified RL-based task. Ineffective reward functions generated by the LLM are filtered out to ensure the stable evolution of the RL agents’ performance over iterations. To evaluate the feasibility of the proposed LLM-enhanced RL paradigm, it is applied to extensive bus holding control scenarios that vary in the number of bus lines, stops, and passenger demand. The results demonstrate the superiority, generalization capability, and robustness of the proposed paradigm compared to vanilla RL strategies, the LLM-based controller, physics-based feedback controllers, and optimization-based controllers. This study sheds light on the great potential of utilizing LLMs in various smart mobility applications.</div></div>","PeriodicalId":49418,"journal":{"name":"Transportation Research Part E-Logistics and Transportation Review","volume":"200 ","pages":"Article 104142"},"PeriodicalIF":8.3000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part E-Logistics and Transportation Review","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1366554525001838","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 0

Abstract

Bus holding control is a widely-adopted strategy for maintaining stability and improving the operational efficiency of bus systems. Traditional model-based methods often face challenges with the low accuracy of bus state prediction and passenger demand estimation. In contrast, Reinforcement Learning (RL), as a data-driven approach, has demonstrated great potential in formulating bus holding strategies. RL determines the optimal control strategies in order to maximize the cumulative reward, which reflects the overall control goals. However, translating sparse and delayed control goals in real-world tasks into dense and real-time rewards for RL is challenging, normally requiring extensive manual trial-and-error. In view of this, this study introduces an automatic reward generation paradigm by leveraging the in-context learning and reasoning capabilities of Large Language Models (LLMs). This new paradigm, termed the LLM-enhanced RL, comprises several LLM-based modules: reward initializer, reward modifier, performance analyzer, and reward refiner. These modules cooperate to initialize and iteratively improve the reward function according to the feedback from training and test results for the specified RL-based task. Ineffective reward functions generated by the LLM are filtered out to ensure the stable evolution of the RL agents’ performance over iterations. To evaluate the feasibility of the proposed LLM-enhanced RL paradigm, it is applied to extensive bus holding control scenarios that vary in the number of bus lines, stops, and passenger demand. The results demonstrate the superiority, generalization capability, and robustness of the proposed paradigm compared to vanilla RL strategies, the LLM-based controller, physics-based feedback controllers, and optimization-based controllers. This study sheds light on the great potential of utilizing LLMs in various smart mobility applications.

查看原文本刊更多论文

通用总线保持控制策略的大语言模型增强强化学习

公交保持控制是一种广泛采用的保持公交系统稳定性和提高运行效率的策略。传统的基于模型的方法往往面临着公交车状态预测和乘客需求估计精度较低的问题。相比之下，强化学习（RL）作为一种数据驱动的方法，在制定公交车持有策略方面显示出巨大的潜力。RL以累积奖励最大化为目标确定最优控制策略，这反映了总体控制目标。然而，将现实世界任务中的稀疏和延迟控制目标转化为RL的密集和实时奖励是具有挑战性的，通常需要大量的人工试错。鉴于此，本研究通过利用大型语言模型（llm）的上下文学习和推理能力，引入了一种自动奖励生成范式。这种新范式被称为llm增强RL，它包括几个基于llm的模块：奖励初始化器、奖励修改器、性能分析器和奖励精炼器。这些模块合作，根据训练和测试结果的反馈，初始化并迭代改进奖励函数，以完成指定的基于rl的任务。过滤掉由RL智能体生成的无效奖励函数，以保证RL智能体性能在迭代过程中的稳定演化。为了评估拟议的llm增强RL范式的可行性，将其应用于广泛的公交车等候控制场景，这些场景在公交线路、站点和乘客需求数量上都有所不同。结果表明，与普通RL策略、基于llm的控制器、基于物理的反馈控制器和基于优化的控制器相比，所提出的范式具有优越性、泛化能力和鲁棒性。这项研究揭示了在各种智能移动应用中利用llm的巨大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transportation Research Part E-Logistics and Transportation Review 工程技术-工程：土木

CiteScore

16.20

自引率

16.00%

发文量

285

审稿时长

62 days

期刊介绍： Transportation Research Part E: Logistics and Transportation Review is a reputable journal that publishes high-quality articles covering a wide range of topics in the field of logistics and transportation research. The journal welcomes submissions on various subjects, including transport economics, transport infrastructure and investment appraisal, evaluation of public policies related to transportation, empirical and analytical studies of logistics management practices and performance, logistics and operations models, and logistics and supply chain management. Part E aims to provide informative and well-researched articles that contribute to the understanding and advancement of the field. The content of the journal is complementary to other prestigious journals in transportation research, such as Transportation Research Part A: Policy and Practice, Part B: Methodological, Part C: Emerging Technologies, Part D: Transport and Environment, and Part F: Traffic Psychology and Behaviour. Together, these journals form a comprehensive and cohesive reference for current research in transportation science.