基于改进Actor-Critic深度强化学习的灵活取货服务快速解决方案

IF 8.4 1区工程技术 Q1 ENGINEERING, CIVIL

IEEE Transactions on Intelligent Transportation Systems Pub Date : 2025-04-21 DOI:10.1109/TITS.2025.3559941

Ran Tian;Zhihui Sun;Longlong Chang;Jiarui Wu;Xin Lu

{"title":"基于改进Actor-Critic深度强化学习的灵活取货服务快速解决方案","authors":"Ran Tian;Zhihui Sun;Longlong Chang;Jiarui Wu;Xin Lu","doi":"10.1109/TITS.2025.3559941","DOIUrl":null,"url":null,"abstract":"The problem of the Flexible Pickup and Delivery Services Problem (FPDSP) arises from the actual needs of multi-warehouse management strategies and is one of the key challenges in the current urban distribution logistics industry. The problem aims to quickly calculate the route planning in complex scenarios to ensure that the total traveling time of the vehicle is minimized while meeting the time window requirements. To address this problem, we propose a deep reinforcement learning method based on the Actor-Critic algorithm to quickly calculate the approximate optimal solution of FPDSP. Specifically, we propose a Transformer Model with Parallel Encoders (TMPE). The model efficiently extracts order features through parallel encoders and then uses serial decoders to optimize the fusion of feature information to optimize the order selection process. In addition, we designed a reward function to reduce the number of repeated pickups made by the vehicle at the same consignor’s location between different orders, thereby effectively reducing the vehicle’s total travel time. Experimental results show that our method can quickly find feasible solutions to the problem compared with heuristic methods on seven different datasets. At the same time, compared with all baseline methods, the number of optimal solutions of our method reaches 14, which significantly improves the problem-solving ability. This result provides a new solution for optimizing pickup and delivery logistics in multiple warehouses in cities in the future.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 6","pages":"7640-7654"},"PeriodicalIF":8.4000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rapid Solution for Flexible Pickup and Delivery Services Problem Based on Improved Actor-Critic Deep Reinforcement Learning\",\"authors\":\"Ran Tian;Zhihui Sun;Longlong Chang;Jiarui Wu;Xin Lu\",\"doi\":\"10.1109/TITS.2025.3559941\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The problem of the Flexible Pickup and Delivery Services Problem (FPDSP) arises from the actual needs of multi-warehouse management strategies and is one of the key challenges in the current urban distribution logistics industry. The problem aims to quickly calculate the route planning in complex scenarios to ensure that the total traveling time of the vehicle is minimized while meeting the time window requirements. To address this problem, we propose a deep reinforcement learning method based on the Actor-Critic algorithm to quickly calculate the approximate optimal solution of FPDSP. Specifically, we propose a Transformer Model with Parallel Encoders (TMPE). The model efficiently extracts order features through parallel encoders and then uses serial decoders to optimize the fusion of feature information to optimize the order selection process. In addition, we designed a reward function to reduce the number of repeated pickups made by the vehicle at the same consignor’s location between different orders, thereby effectively reducing the vehicle’s total travel time. Experimental results show that our method can quickly find feasible solutions to the problem compared with heuristic methods on seven different datasets. At the same time, compared with all baseline methods, the number of optimal solutions of our method reaches 14, which significantly improves the problem-solving ability. This result provides a new solution for optimizing pickup and delivery logistics in multiple warehouses in cities in the future.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 6\",\"pages\":\"7640-7654\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2025-04-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10972166/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10972166/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

摘要

柔性取货服务问题（FPDSP）是多仓库管理策略的实际需要而产生的问题，是当前城市配送物流行业面临的主要挑战之一。该问题旨在快速计算复杂场景下的路线规划，在满足时间窗要求的情况下，使车辆的总行驶时间最小化。为了解决这个问题，我们提出了一种基于Actor-Critic算法的深度强化学习方法来快速计算FPDSP的近似最优解。具体来说，我们提出了一个具有并行编码器（TMPE）的变压器模型。该模型通过并行编码器高效提取序列特征，然后利用串行解码器对特征信息进行优化融合，优化序列选择过程。此外，我们设计了奖励函数，减少车辆在不同订单之间在同一发货人地点重复取货的次数，从而有效减少车辆的总行程时间。实验结果表明，在七个不同的数据集上，与启发式方法相比，我们的方法可以快速找到问题的可行解。同时，与所有基线方法相比，我们的方法的最优解达到14个，显著提高了问题的解决能力。这一结果为未来优化城市多个仓库的取件和配送物流提供了新的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Rapid Solution for Flexible Pickup and Delivery Services Problem Based on Improved Actor-Critic Deep Reinforcement Learning

The problem of the Flexible Pickup and Delivery Services Problem (FPDSP) arises from the actual needs of multi-warehouse management strategies and is one of the key challenges in the current urban distribution logistics industry. The problem aims to quickly calculate the route planning in complex scenarios to ensure that the total traveling time of the vehicle is minimized while meeting the time window requirements. To address this problem, we propose a deep reinforcement learning method based on the Actor-Critic algorithm to quickly calculate the approximate optimal solution of FPDSP. Specifically, we propose a Transformer Model with Parallel Encoders (TMPE). The model efficiently extracts order features through parallel encoders and then uses serial decoders to optimize the fusion of feature information to optimize the order selection process. In addition, we designed a reward function to reduce the number of repeated pickups made by the vehicle at the same consignor’s location between different orders, thereby effectively reducing the vehicle’s total travel time. Experimental results show that our method can quickly find feasible solutions to the problem compared with heuristic methods on seven different datasets. At the same time, compared with all baseline methods, the number of optimal solutions of our method reaches 14, which significantly improves the problem-solving ability. This result provides a new solution for optimizing pickup and delivery logistics in multiple warehouses in cities in the future.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.