Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window

IF 6.6 4区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Intelligent Systems and Technology Pub Date : 2024-01-25 DOI:10.1145/3625232

Zefang Zong, Tong Xia, Meng Zheng, Yong Li

{"title":"Reinforcement Learning for Solving Multiple Vehicle Routing Problem with Time Window","authors":"Zefang Zong, Tong Xia, Meng Zheng, Yong Li","doi":"10.1145/3625232","DOIUrl":null,"url":null,"abstract":"<p>Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to \\(11.7\\% \\) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.</p>","PeriodicalId":48967,"journal":{"name":"ACM Transactions on Intelligent Systems and Technology","volume":"161 1","pages":""},"PeriodicalIF":6.6000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Intelligent Systems and Technology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3625232","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Vehicle routing problem with time window (VRPTW) is of great importance for a wide spectrum of services and real-life applications, such as online take-out and car-hailing platforms. A promising method should generate high-qualified solutions within limited inference time, and there are three major challenges: a) directly optimizing the goal with several practical constraints; b) efficiently handling individual time window limits; and c) modeling the cooperation among the vehicle fleet. In this paper, we present an end-to-end reinforcement learning framework to solve VRPTW. First, we propose an agent model that encodes constraints into features as the input, and conducts harsh policy on the output when generating deterministic results. Second, we design a time penalty augmented reward to model the time window limits during gradient propagation. Third, we design a task handler to enable the cooperation among different vehicles. We perform extensive experiments on two real-world datasets and one public benchmark dataset. Results demonstrate that our solution improves the performance by up to \(11.7\% \) compared to other RL baselines, and could generate solutions for instances within seconds while existing heuristic baselines take for minutes as well as maintaining the quality of solutions. Moreover, our solution is thoroughly analysed with meaningful implications due to the real-time response ability.

查看原文本刊更多论文

用强化学习解决带时间窗口的多车路由问题

带时间窗口的车辆路由问题（VRPTW）对于网络外卖和打车平台等广泛的服务和现实应用具有重要意义。一个有前途的方法应在有限的推理时间内生成高质量的解决方案，而目前存在三大挑战：a) 在多个实际约束条件下直接优化目标；b) 高效处理单个时间窗口限制；c) 对车队之间的合作进行建模。在本文中，我们提出了一个端到端的强化学习框架来解决 VRPTW。首先，我们提出了一个代理模型，该模型将约束条件编码成特征作为输入，并在生成确定性结果时对输出执行苛刻策略。其次，我们设计了一种时间惩罚增强奖励，以模拟梯度传播过程中的时间窗口限制。第三，我们设计了一个任务处理程序，以实现不同车辆之间的合作。我们在两个真实世界数据集和一个公共基准数据集上进行了大量实验。结果表明，与其他 RL 基线相比，我们的解决方案提高了高达（11.7%）的性能，并能在数秒内为实例生成解决方案，而现有的启发式基线则需要数分钟，同时还能保持解决方案的质量。此外，由于实时响应能力，我们的解决方案得到了全面的分析，并产生了有意义的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Intelligent Systems and Technology COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

9.30

自引率

2.00%

发文量

131

期刊介绍： ACM Transactions on Intelligent Systems and Technology is a scholarly journal that publishes the highest quality papers on intelligent systems, applicable algorithms and technology with a multi-disciplinary perspective. An intelligent system is one that uses artificial intelligence (AI) techniques to offer important services (e.g., as a component of a larger system) to allow integrated systems to perceive, reason, learn, and act intelligently in the real world. ACM TIST is published quarterly (six issues a year). Each issue has 8-11 regular papers, with around 20 published journal pages or 10,000 words per paper. Additional references, proofs, graphs or detailed experiment results can be submitted as a separate appendix, while excessively lengthy papers will be rejected automatically. Authors can include online-only appendices for additional content of their published papers and are encouraged to share their code and/or data with other readers.