A reinforcement learning-based dynamic multi-objective optimization approach for integrated timetabling and vehicle scheduling

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-05-09 DOI:10.1016/j.knosys.2025.113735

Yindong Shen , Wenliang Xie

{"title":"A reinforcement learning-based dynamic multi-objective optimization approach for integrated timetabling and vehicle scheduling","authors":"Yindong Shen , Wenliang Xie","doi":"10.1016/j.knosys.2025.113735","DOIUrl":null,"url":null,"abstract":"<div><div>Dynamic integrated timetabling and vehicle scheduling (D-ITVS) is essential for mitigating the negative impacts of service disruptions. It involves multiple rescheduling stages, with inherent optimization similarities across these stages. However, existing optimization approaches for the <span>D-ITVS</span> problem have not systematically exploited these similarities, overlooking the potential for decision knowledge from previous stages to inform the current stage. To address this gap, this paper proposes a reinforcement learning-based dynamic multi-objective optimization approach (RL-DMOA), which focuses on transferring decision knowledge between rescheduling stages. This approach models the optimization process of each rescheduling stage in the <span>D-ITVS</span> problem as a Markov decision process, incorporating a state space with vehicle information, action space for vehicle assignment, and a multi-objective reward function. A multi-objective deep reinforcement learning (M-DRL) agent is employed within the RL-DMOA to select actions based on the state at each decision point. The agent is constructed on a multi-objective deep Q-learning network (M-DQN), with a Q-value adjustment layer incorporated to prevent the selection of invalid actions. To select optimal actions while balancing the conflicts among multiple objectives, the M-DRL agent applies a non-dominated sorting selection strategy. Experimental results demonstrate that the proposed RL-DMOA is capable of generating timetables and vehicle schedules with reduced costs, enhanced robustness, and improved convergence and diversity across all rescheduling stages. By balancing operational costs and passenger service quality, these improvements benefit transit operators, and during daily operations, passengers enjoy reduced travel costs and enhanced service reliability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"321 ","pages":"Article 113735"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125007816","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Dynamic integrated timetabling and vehicle scheduling (D-ITVS) is essential for mitigating the negative impacts of service disruptions. It involves multiple rescheduling stages, with inherent optimization similarities across these stages. However, existing optimization approaches for the D-ITVS problem have not systematically exploited these similarities, overlooking the potential for decision knowledge from previous stages to inform the current stage. To address this gap, this paper proposes a reinforcement learning-based dynamic multi-objective optimization approach (RL-DMOA), which focuses on transferring decision knowledge between rescheduling stages. This approach models the optimization process of each rescheduling stage in the D-ITVS problem as a Markov decision process, incorporating a state space with vehicle information, action space for vehicle assignment, and a multi-objective reward function. A multi-objective deep reinforcement learning (M-DRL) agent is employed within the RL-DMOA to select actions based on the state at each decision point. The agent is constructed on a multi-objective deep Q-learning network (M-DQN), with a Q-value adjustment layer incorporated to prevent the selection of invalid actions. To select optimal actions while balancing the conflicts among multiple objectives, the M-DRL agent applies a non-dominated sorting selection strategy. Experimental results demonstrate that the proposed RL-DMOA is capable of generating timetables and vehicle schedules with reduced costs, enhanced robustness, and improved convergence and diversity across all rescheduling stages. By balancing operational costs and passenger service quality, these improvements benefit transit operators, and during daily operations, passengers enjoy reduced travel costs and enhanced service reliability.

查看原文本刊更多论文

基于强化学习的综合调度与车辆调度动态多目标优化方法

动态综合调度和车辆调度（D-ITVS）对于减轻服务中断的负面影响至关重要。它涉及多个重新调度阶段，这些阶段之间具有内在的优化相似性。然而，针对D-ITVS问题的现有优化方法并没有系统地利用这些相似性，忽略了从前阶段获得的决策知识为当前阶段提供信息的潜力。为了解决这一问题，本文提出了一种基于强化学习的动态多目标优化方法（RL-DMOA），该方法的重点是在重调度阶段之间传递决策知识。该方法将D-ITVS问题中每个重调度阶段的优化过程建模为马尔可夫决策过程，将包含车辆信息的状态空间、车辆分配的动作空间和多目标奖励函数相结合。在RL-DMOA中使用多目标深度强化学习（M-DRL）智能体根据每个决策点的状态选择动作。该智能体构建在多目标深度q -学习网络（M-DQN）上，并加入了q值调整层以防止选择无效动作。为了在平衡多个目标之间的冲突的同时选择最优行为，M-DRL智能体采用了非支配排序选择策略。实验结果表明，所提出的RL-DMOA能够生成时间表和车辆调度，同时降低了成本，增强了鲁棒性，提高了所有重调度阶段的收敛性和多样性。通过平衡运营成本和乘客服务质量，这些改进使运输运营商受益，并且在日常运营中，乘客可以降低旅行成本并提高服务可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.