{"title":"A reinforcement learning-based dynamic multi-objective optimization approach for integrated timetabling and vehicle scheduling","authors":"Yindong Shen , Wenliang Xie","doi":"10.1016/j.knosys.2025.113735","DOIUrl":null,"url":null,"abstract":"<div><div>Dynamic integrated timetabling and vehicle scheduling (D-ITVS) is essential for mitigating the negative impacts of service disruptions. It involves multiple rescheduling stages, with inherent optimization similarities across these stages. However, existing optimization approaches for the <span>D-ITVS</span> problem have not systematically exploited these similarities, overlooking the potential for decision knowledge from previous stages to inform the current stage. To address this gap, this paper proposes a reinforcement learning-based dynamic multi-objective optimization approach (RL-DMOA), which focuses on transferring decision knowledge between rescheduling stages. This approach models the optimization process of each rescheduling stage in the <span>D-ITVS</span> problem as a Markov decision process, incorporating a state space with vehicle information, action space for vehicle assignment, and a multi-objective reward function. A multi-objective deep reinforcement learning (M-DRL) agent is employed within the RL-DMOA to select actions based on the state at each decision point. The agent is constructed on a multi-objective deep Q-learning network (M-DQN), with a Q-value adjustment layer incorporated to prevent the selection of invalid actions. To select optimal actions while balancing the conflicts among multiple objectives, the M-DRL agent applies a non-dominated sorting selection strategy. Experimental results demonstrate that the proposed RL-DMOA is capable of generating timetables and vehicle schedules with reduced costs, enhanced robustness, and improved convergence and diversity across all rescheduling stages. By balancing operational costs and passenger service quality, these improvements benefit transit operators, and during daily operations, passengers enjoy reduced travel costs and enhanced service reliability.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"321 ","pages":"Article 113735"},"PeriodicalIF":7.2000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125007816","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Dynamic integrated timetabling and vehicle scheduling (D-ITVS) is essential for mitigating the negative impacts of service disruptions. It involves multiple rescheduling stages, with inherent optimization similarities across these stages. However, existing optimization approaches for the D-ITVS problem have not systematically exploited these similarities, overlooking the potential for decision knowledge from previous stages to inform the current stage. To address this gap, this paper proposes a reinforcement learning-based dynamic multi-objective optimization approach (RL-DMOA), which focuses on transferring decision knowledge between rescheduling stages. This approach models the optimization process of each rescheduling stage in the D-ITVS problem as a Markov decision process, incorporating a state space with vehicle information, action space for vehicle assignment, and a multi-objective reward function. A multi-objective deep reinforcement learning (M-DRL) agent is employed within the RL-DMOA to select actions based on the state at each decision point. The agent is constructed on a multi-objective deep Q-learning network (M-DQN), with a Q-value adjustment layer incorporated to prevent the selection of invalid actions. To select optimal actions while balancing the conflicts among multiple objectives, the M-DRL agent applies a non-dominated sorting selection strategy. Experimental results demonstrate that the proposed RL-DMOA is capable of generating timetables and vehicle schedules with reduced costs, enhanced robustness, and improved convergence and diversity across all rescheduling stages. By balancing operational costs and passenger service quality, these improvements benefit transit operators, and during daily operations, passengers enjoy reduced travel costs and enhanced service reliability.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.