Qingshu Guan , Shuangsi Xue , Junkai Tan , Lixin Jia , Hui Cao , Badong Chen
{"title":"Dynamic embedding-based deep reinforcement learning for heterogeneous capacitated VRPs with unloading time constraints","authors":"Qingshu Guan , Shuangsi Xue , Junkai Tan , Lixin Jia , Hui Cao , Badong Chen","doi":"10.1016/j.eswa.2025.128660","DOIUrl":null,"url":null,"abstract":"<div><div>Capacitated vehicle routing problems (CVRPs) have garnered growing attention due to their extensive applications across various fields. However, existing deep reinforcement learning (DRL) approaches often cope with homogeneous vehicle fleets, failing to account for differences in vehicle capacities and speeds. Moreover, these methods typically overlook the real-life constraint of unloading time, where vehicles cannot depart until all goods are delivered. These limitations intrinsically restrict their practical applications. To address these issues, we introduce a heterogeneous CVRP with unloading time constraints (HCVRP-UTC) and propose a dynamic embedding-based DRL (DE-DRL) for tackling it. Our approach leverages an innovative encoder-updater-decoder (EUD) framework. Specifically, the encoder generates feature embeddings for both customer nodes and heterogeneous vehicles, while the updater iteratively refines these embeddings, incorporating both static customer data and dynamic vehicle information, to capture the real-time state variation and provide sufficient clues for decision-making. Subsequently, the decoder decouples the complicated problem into a series of recursive vehicle-selection and vehicle-specific node-selection tasks, enhancing the precision and efficiency of route planning. Finally, we evaluate the proposed approach on both synthetic and real-world datasets of varying scales and distributions. Experimental results demonstrate that our DE-DRL consistently outperforms heuristic and state-of-the-art DRL-based methods, reducing optimality gaps by up to 13.53 %. Notably, DE-DRL also exhibits superior generalization performance, extending its applicability to broader real-world scenarios.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"293 ","pages":"Article 128660"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742502278X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Capacitated vehicle routing problems (CVRPs) have garnered growing attention due to their extensive applications across various fields. However, existing deep reinforcement learning (DRL) approaches often cope with homogeneous vehicle fleets, failing to account for differences in vehicle capacities and speeds. Moreover, these methods typically overlook the real-life constraint of unloading time, where vehicles cannot depart until all goods are delivered. These limitations intrinsically restrict their practical applications. To address these issues, we introduce a heterogeneous CVRP with unloading time constraints (HCVRP-UTC) and propose a dynamic embedding-based DRL (DE-DRL) for tackling it. Our approach leverages an innovative encoder-updater-decoder (EUD) framework. Specifically, the encoder generates feature embeddings for both customer nodes and heterogeneous vehicles, while the updater iteratively refines these embeddings, incorporating both static customer data and dynamic vehicle information, to capture the real-time state variation and provide sufficient clues for decision-making. Subsequently, the decoder decouples the complicated problem into a series of recursive vehicle-selection and vehicle-specific node-selection tasks, enhancing the precision and efficiency of route planning. Finally, we evaluate the proposed approach on both synthetic and real-world datasets of varying scales and distributions. Experimental results demonstrate that our DE-DRL consistently outperforms heuristic and state-of-the-art DRL-based methods, reducing optimality gaps by up to 13.53 %. Notably, DE-DRL also exhibits superior generalization performance, extending its applicability to broader real-world scenarios.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.