RL SolVeR Pro:用于解决车辆路线问题的强化学习

Arun Kumar Kalakanti, Shivani Verma, T. Paul, Takufumi Yoshida
{"title":"RL SolVeR Pro:用于解决车辆路线问题的强化学习","authors":"Arun Kumar Kalakanti, Shivani Verma, T. Paul, Takufumi Yoshida","doi":"10.1109/AiDAS47888.2019.8970890","DOIUrl":null,"url":null,"abstract":"Vehicle Routing Problem (VRP) is a well-known NP-hard combinatorial optimization problem at the heart of the transportation and logistics research. VRP can be exactly solved only for small instances of the problem with conventional methods. Traditionally this problem has been solved using heuristic methods for large instances even though there is no guarantee of optimality. Efficient solution adopted to VRP may lead to significant savings per year in large transportation and logistics systems. Much of the recent works using Reinforcement Learning are computationally intensive and face the three curse of dimensionality: explosions in state and action spaces and high stochasticity i.e., large number of possible next states for a given state action pair. Also, recent works on VRP don’t consider the realistic simulation settings of customer environments, stochastic elements and scalability aspects as they use only standard Solomon benchmark instances of at most 100 customers. In this work, Reinforcement Learning Solver for Vehicle Routing Problem (RL SolVeR Pro) is proposed wherein the optimal route learning problem is cast as a Markov Decision Process (MDP). The curse of dimensionality of RL is also overcome by using two-phase solver with geometric clustering. Also, realistic simulation for VRP was used to validate the effectiveness and applicability of the proposed RL SolVeR Pro under various conditions and constraints. Our simulation results suggest that our proposed method is able to obtain better or same level of results, compared to the two best-known heuristics: Clarke-Wright Savings and Sweep Heuristic. The proposed RL Solver can be applied to other variants of the VRP and has the potential to be applied more generally to other combinatorial optimization problems.","PeriodicalId":227508,"journal":{"name":"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"RL SolVeR Pro: Reinforcement Learning for Solving Vehicle Routing Problem\",\"authors\":\"Arun Kumar Kalakanti, Shivani Verma, T. Paul, Takufumi Yoshida\",\"doi\":\"10.1109/AiDAS47888.2019.8970890\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vehicle Routing Problem (VRP) is a well-known NP-hard combinatorial optimization problem at the heart of the transportation and logistics research. VRP can be exactly solved only for small instances of the problem with conventional methods. Traditionally this problem has been solved using heuristic methods for large instances even though there is no guarantee of optimality. Efficient solution adopted to VRP may lead to significant savings per year in large transportation and logistics systems. Much of the recent works using Reinforcement Learning are computationally intensive and face the three curse of dimensionality: explosions in state and action spaces and high stochasticity i.e., large number of possible next states for a given state action pair. Also, recent works on VRP don’t consider the realistic simulation settings of customer environments, stochastic elements and scalability aspects as they use only standard Solomon benchmark instances of at most 100 customers. In this work, Reinforcement Learning Solver for Vehicle Routing Problem (RL SolVeR Pro) is proposed wherein the optimal route learning problem is cast as a Markov Decision Process (MDP). The curse of dimensionality of RL is also overcome by using two-phase solver with geometric clustering. Also, realistic simulation for VRP was used to validate the effectiveness and applicability of the proposed RL SolVeR Pro under various conditions and constraints. Our simulation results suggest that our proposed method is able to obtain better or same level of results, compared to the two best-known heuristics: Clarke-Wright Savings and Sweep Heuristic. The proposed RL Solver can be applied to other variants of the VRP and has the potential to be applied more generally to other combinatorial optimization problems.\",\"PeriodicalId\":227508,\"journal\":{\"name\":\"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AiDAS47888.2019.8970890\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 1st International Conference on Artificial Intelligence and Data Sciences (AiDAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AiDAS47888.2019.8970890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

车辆路径问题(VRP)是一个众所周知的NP-hard组合优化问题,是交通物流研究的核心问题。传统方法只能精确地解决VRP问题的小实例。传统上,这个问题是使用启发式方法来解决大型实例的,即使没有最优性的保证。采用VRP的有效解决方案可以在大型运输和物流系统中每年节省大量资金。最近使用强化学习的许多工作都是计算密集型的,并且面临着三个维度的诅咒:状态和动作空间的爆炸以及高随机性,即给定状态动作对的大量可能的下一个状态。此外,最近关于VRP的工作并没有考虑到客户环境的现实模拟设置,随机元素和可伸缩性方面,因为它们只使用最多100个客户的标准Solomon基准实例。在这项工作中,提出了车辆路径问题的强化学习求解器(RL Solver Pro),其中最优路径学习问题被转换为马尔可夫决策过程(MDP)。采用几何聚类的两相求解器克服了RL的维数问题。通过VRP仿真,验证了所提出的RL SolVeR Pro在各种条件和约束下的有效性和适用性。我们的模拟结果表明,与Clarke-Wright Savings和Sweep Heuristic这两种最著名的启发式方法相比,我们提出的方法能够获得更好或相同水平的结果。所提出的RL求解器可以应用于VRP的其他变体,并且有可能更广泛地应用于其他组合优化问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RL SolVeR Pro: Reinforcement Learning for Solving Vehicle Routing Problem
Vehicle Routing Problem (VRP) is a well-known NP-hard combinatorial optimization problem at the heart of the transportation and logistics research. VRP can be exactly solved only for small instances of the problem with conventional methods. Traditionally this problem has been solved using heuristic methods for large instances even though there is no guarantee of optimality. Efficient solution adopted to VRP may lead to significant savings per year in large transportation and logistics systems. Much of the recent works using Reinforcement Learning are computationally intensive and face the three curse of dimensionality: explosions in state and action spaces and high stochasticity i.e., large number of possible next states for a given state action pair. Also, recent works on VRP don’t consider the realistic simulation settings of customer environments, stochastic elements and scalability aspects as they use only standard Solomon benchmark instances of at most 100 customers. In this work, Reinforcement Learning Solver for Vehicle Routing Problem (RL SolVeR Pro) is proposed wherein the optimal route learning problem is cast as a Markov Decision Process (MDP). The curse of dimensionality of RL is also overcome by using two-phase solver with geometric clustering. Also, realistic simulation for VRP was used to validate the effectiveness and applicability of the proposed RL SolVeR Pro under various conditions and constraints. Our simulation results suggest that our proposed method is able to obtain better or same level of results, compared to the two best-known heuristics: Clarke-Wright Savings and Sweep Heuristic. The proposed RL Solver can be applied to other variants of the VRP and has the potential to be applied more generally to other combinatorial optimization problems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信