A deep reinforcement learning method for solving Two-Echelon Location-Routing Problem

IF 4.3 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Operations Research Pub Date : 2025-07-21 DOI:10.1016/j.cor.2025.107210

Shuo Huang , Yaoxin Wu , Zhiguang Cao , Xuexi Zhang

{"title":"A deep reinforcement learning method for solving Two-Echelon Location-Routing Problem","authors":"Shuo Huang , Yaoxin Wu , Zhiguang Cao , Xuexi Zhang","doi":"10.1016/j.cor.2025.107210","DOIUrl":null,"url":null,"abstract":"<div><div>In the domain of logistics and supply chain management, optimizing distribution networks is a crucial task for improving efficiency and reducing operational costs. This paper focuses on addressing the Two-Echelon Location-Routing Problem (2E-LRP), with the aim to concurrently optimize the facility (i.e., the transfer station and depot) placement, and vehicle routing for transporting goods between depots, transfer stations, and customers. We propose a method based on deep reinforcement learning to minimize the total costs associated with the operational cost of facilities, the cost of vehicle usage, and transportation cost. Specifically, we design an encoder–decoder structured two-stage attention model that constructs solutions of location-routing problems in two echelons, respectively. A simple yet effective recurrent unit is used in decoder to capture context embeddings, allowing the model to selectively incorporate beneficial information from previous construction steps. The contexts are then used for attention computation to select facilities and customers and thus determine their placements and the routes. The model is trained by REINFORCE algorithm with a shared baseline, and its performance is validated through comparisons with Gurobi solver and typical heuristic algorithms. Extensive results showcase the favorable performance of our model on both synthetic and benchmark instances, which offers a competitive alternative to traditional solutions. Specifically, our model achieves up to 1.5% cost reduction and over 99% computation time savings compared to traditional heuristic algorithms in large instance. In addition, the generalization is fairly good to cope with instances of different scales and distributions.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"183 ","pages":"Article 107210"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054825002382","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In the domain of logistics and supply chain management, optimizing distribution networks is a crucial task for improving efficiency and reducing operational costs. This paper focuses on addressing the Two-Echelon Location-Routing Problem (2E-LRP), with the aim to concurrently optimize the facility (i.e., the transfer station and depot) placement, and vehicle routing for transporting goods between depots, transfer stations, and customers. We propose a method based on deep reinforcement learning to minimize the total costs associated with the operational cost of facilities, the cost of vehicle usage, and transportation cost. Specifically, we design an encoder–decoder structured two-stage attention model that constructs solutions of location-routing problems in two echelons, respectively. A simple yet effective recurrent unit is used in decoder to capture context embeddings, allowing the model to selectively incorporate beneficial information from previous construction steps. The contexts are then used for attention computation to select facilities and customers and thus determine their placements and the routes. The model is trained by REINFORCE algorithm with a shared baseline, and its performance is validated through comparisons with Gurobi solver and typical heuristic algorithms. Extensive results showcase the favorable performance of our model on both synthetic and benchmark instances, which offers a competitive alternative to traditional solutions. Specifically, our model achieves up to 1.5% cost reduction and over 99% computation time savings compared to traditional heuristic algorithms in large instance. In addition, the generalization is fairly good to cope with instances of different scales and distributions.

查看原文本刊更多论文

求解两梯队位置路由问题的深度强化学习方法

在物流和供应链管理领域，优化配送网络是提高效率和降低运营成本的关键任务。本文主要研究两梯次定位路径问题（Two-Echelon Location-Routing Problem, 2E-LRP），旨在同时优化设施（即中转站和仓库）的布局，以及在仓库、中转站和客户之间运输货物的车辆路径。我们提出了一种基于深度强化学习的方法，以最小化与设施运营成本、车辆使用成本和运输成本相关的总成本。具体而言，我们设计了一个编码器-解码器结构的两阶段注意力模型，该模型分别在两个层次上构建了位置路由问题的解决方案。在解码器中使用了一个简单而有效的循环单元来捕获上下文嵌入，允许模型有选择地从先前的构建步骤中合并有益的信息。然后使用上下文进行注意力计算，以选择设施和客户，从而确定它们的位置和路线。采用共享基线强化算法对模型进行训练，并与Gurobi求解器和典型启发式算法进行比较，验证了模型的性能。广泛的结果显示了我们的模型在合成和基准实例上的良好性能，这为传统解决方案提供了一个有竞争力的替代方案。具体来说，与传统的启发式算法相比，我们的模型在大型实例中实现了高达1.5%的成本降低和超过99%的计算时间节省。此外，这种泛化对于处理不同规模和分布的实例是相当好的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Operations Research 工程技术-工程：工业

CiteScore

8.60

自引率

8.70%

发文量

292

审稿时长

8.5 months

期刊介绍： Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.