{"title":"Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates","authors":"Yuanyuan Li, Claudia Archetti, Ivana Ljubić","doi":"10.1287/trsc.2022.0366","DOIUrl":null,"url":null,"abstract":"In this paper, we study a sequential decision-making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the expected number of parcels that can be delivered during service hours. We propose two reinforcement learning (RL) approaches for solving this problem. These approaches rely on a look-ahead strategy in which future release dates are sampled in a Monte Carlo fashion, and a batch approach is used to approximate future routes. Both RL approaches are based on value function approximation: One combines it with a consensus function (VFA-CF) and the other one with a two-stage stochastic integer linear programming model (VFA-2S). VFA-CF and VFA-2S do not need extensive training as they are based on very few hyperparameters and make good use of integer linear programming (ILP) and branch-and-cut–based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into VFA-CF/VFA-2S. In an empirical study, we conduct a competitive analysis using upper bounds with perfect information. We also show that VFA-CF and VFA-2S greatly outperform alternative approaches that (1) do not rely on future information (2) are based on point estimation of future information, (3) use heuristics rather than exact methods, or (4) use exact evaluations of future rewards.Funding: This work was supported by the CY Initiative of Excellence [ANR-16- IDEX-0008].Supplemental Material: The online appendices are available at https://doi.org/10.1287/trsc.2022.0366 .","PeriodicalId":51202,"journal":{"name":"Transportation Science","volume":"52 1","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Science","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1287/trsc.2022.0366","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we study a sequential decision-making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the expected number of parcels that can be delivered during service hours. We propose two reinforcement learning (RL) approaches for solving this problem. These approaches rely on a look-ahead strategy in which future release dates are sampled in a Monte Carlo fashion, and a batch approach is used to approximate future routes. Both RL approaches are based on value function approximation: One combines it with a consensus function (VFA-CF) and the other one with a two-stage stochastic integer linear programming model (VFA-2S). VFA-CF and VFA-2S do not need extensive training as they are based on very few hyperparameters and make good use of integer linear programming (ILP) and branch-and-cut–based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into VFA-CF/VFA-2S. In an empirical study, we conduct a competitive analysis using upper bounds with perfect information. We also show that VFA-CF and VFA-2S greatly outperform alternative approaches that (1) do not rely on future information (2) are based on point estimation of future information, (3) use heuristics rather than exact methods, or (4) use exact evaluations of future rewards.Funding: This work was supported by the CY Initiative of Excellence [ANR-16- IDEX-0008].Supplemental Material: The online appendices are available at https://doi.org/10.1287/trsc.2022.0366 .
期刊介绍:
Transportation Science, published quarterly by INFORMS, is the flagship journal of the Transportation Science and Logistics Society of INFORMS. As the foremost scientific journal in the cross-disciplinary operational research field of transportation analysis, Transportation Science publishes high-quality original contributions and surveys on phenomena associated with all modes of transportation, present and prospective, including mainly all levels of planning, design, economic, operational, and social aspects. Transportation Science focuses primarily on fundamental theories, coupled with observational and experimental studies of transportation and logistics phenomena and processes, mathematical models, advanced methodologies and novel applications in transportation and logistics systems analysis, planning and design. The journal covers a broad range of topics that include vehicular and human traffic flow theories, models and their application to traffic operations and management, strategic, tactical, and operational planning of transportation and logistics systems; performance analysis methods and system design and optimization; theories and analysis methods for network and spatial activity interaction, equilibrium and dynamics; economics of transportation system supply and evaluation; methodologies for analysis of transportation user behavior and the demand for transportation and logistics services.
Transportation Science is international in scope, with editors from nations around the globe. The editorial board reflects the diverse interdisciplinary interests of the transportation science and logistics community, with members that hold primary affiliations in engineering (civil, industrial, and aeronautical), physics, economics, applied mathematics, and business.