Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates

IF 4.8 2区工程技术 Q1 OPERATIONS RESEARCH & MANAGEMENT SCIENCE

Transportation Science Pub Date : 2024-07-18 DOI:10.1287/trsc.2022.0366

Yuanyuan Li, Claudia Archetti, Ivana Ljubić

{"title":"Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates","authors":"Yuanyuan Li, Claudia Archetti, Ivana Ljubić","doi":"10.1287/trsc.2022.0366","DOIUrl":null,"url":null,"abstract":"In this paper, we study a sequential decision-making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the expected number of parcels that can be delivered during service hours. We propose two reinforcement learning (RL) approaches for solving this problem. These approaches rely on a look-ahead strategy in which future release dates are sampled in a Monte Carlo fashion, and a batch approach is used to approximate future routes. Both RL approaches are based on value function approximation: One combines it with a consensus function (VFA-CF) and the other one with a two-stage stochastic integer linear programming model (VFA-2S). VFA-CF and VFA-2S do not need extensive training as they are based on very few hyperparameters and make good use of integer linear programming (ILP) and branch-and-cut–based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into VFA-CF/VFA-2S. In an empirical study, we conduct a competitive analysis using upper bounds with perfect information. We also show that VFA-CF and VFA-2S greatly outperform alternative approaches that (1) do not rely on future information (2) are based on point estimation of future information, (3) use heuristics rather than exact methods, or (4) use exact evaluations of future rewards.Funding: This work was supported by the CY Initiative of Excellence [ANR-16- IDEX-0008].Supplemental Material: The online appendices are available at https://doi.org/10.1287/trsc.2022.0366 .","PeriodicalId":51202,"journal":{"name":"Transportation Science","volume":"52 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Science","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1287/trsc.2022.0366","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we study a sequential decision-making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the expected number of parcels that can be delivered during service hours. We propose two reinforcement learning (RL) approaches for solving this problem. These approaches rely on a look-ahead strategy in which future release dates are sampled in a Monte Carlo fashion, and a batch approach is used to approximate future routes. Both RL approaches are based on value function approximation: One combines it with a consensus function (VFA-CF) and the other one with a two-stage stochastic integer linear programming model (VFA-2S). VFA-CF and VFA-2S do not need extensive training as they are based on very few hyperparameters and make good use of integer linear programming (ILP) and branch-and-cut–based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into VFA-CF/VFA-2S. In an empirical study, we conduct a competitive analysis using upper bounds with perfect information. We also show that VFA-CF and VFA-2S greatly outperform alternative approaches that (1) do not rely on future information (2) are based on point estimation of future information, (3) use heuristics rather than exact methods, or (4) use exact evaluations of future rewards.Funding: This work was supported by the CY Initiative of Excellence [ANR-16- IDEX-0008].Supplemental Material: The online appendices are available at https://doi.org/10.1287/trsc.2022.0366 .

查看原文本刊更多论文

随机和动态发布日期定向问题的强化学习方法

在本文中，我们研究了电子商务承运商面临的一个顺序决策问题，即在假设包裹到达仓库的时间是随机和动态的情况下，何时从中心仓库派出车辆为客户提供服务，以及以何种顺序提供服务。我们的目标是最大化服务时间内可交付包裹的预期数量。我们提出了两种强化学习 (RL) 方法来解决这个问题。这些方法依赖于一种前瞻性策略，即以蒙特卡洛方式对未来的投递日期进行采样，并使用批量方法来近似未来的路线。这两种 RL 方法都基于值函数近似：一种是将其与共识函数相结合（VFA-CF），另一种是与两阶段随机整数线性规划模型相结合（VFA-2S）。VFA-CF 和 VFA-2S 不需要大量训练，因为它们基于极少的超参数，并能很好地利用整数线性规划 (ILP) 和基于分支切割的精确方法来提高决策质量。我们还为最优策略的部分表征建立了充分条件，并将其集成到 VFA-CF/VFA-2S 中。在实证研究中，我们利用完全信息的上界进行了竞争分析。我们还表明，VFA-CF 和 VFA-2S 大大优于以下替代方法：(1) 不依赖未来信息；(2) 基于对未来信息的点估计；(3) 使用启发式而非精确方法；或 (4) 使用对未来回报的精确评估：这项工作得到了 CY 卓越计划 [ANR-16- IDEX-0008] 的支持：在线附录见 https://doi.org/10.1287/trsc.2022.0366 。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transportation Science 工程技术-运筹学与管理科学

CiteScore

8.30

自引率

10.90%

发文量

111

审稿时长

12 months

期刊介绍： Transportation Science, published quarterly by INFORMS, is the flagship journal of the Transportation Science and Logistics Society of INFORMS. As the foremost scientific journal in the cross-disciplinary operational research field of transportation analysis, Transportation Science publishes high-quality original contributions and surveys on phenomena associated with all modes of transportation, present and prospective, including mainly all levels of planning, design, economic, operational, and social aspects. Transportation Science focuses primarily on fundamental theories, coupled with observational and experimental studies of transportation and logistics phenomena and processes, mathematical models, advanced methodologies and novel applications in transportation and logistics systems analysis, planning and design. The journal covers a broad range of topics that include vehicular and human traffic flow theories, models and their application to traffic operations and management, strategic, tactical, and operational planning of transportation and logistics systems; performance analysis methods and system design and optimization; theories and analysis methods for network and spatial activity interaction, equilibrium and dynamics; economics of transportation system supply and evaluation; methodologies for analysis of transportation user behavior and the demand for transportation and logistics services. Transportation Science is international in scope, with editors from nations around the globe. The editorial board reflects the diverse interdisciplinary interests of the transportation science and logistics community, with members that hold primary affiliations in engineering (civil, industrial, and aeronautical), physics, economics, applied mathematics, and business.