Cross-Region Courier Displacement for On-Demand Delivery With Multi-Agent Reinforcement Learning

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2023-03-28 DOI:10.1109/TBDATA.2023.3262408

Shuai Wang;Shijie Hu;Baoshen Guo;Guang Wang

{"title":"Cross-Region Courier Displacement for On-Demand Delivery With Multi-Agent Reinforcement Learning","authors":"Shuai Wang;Shijie Hu;Baoshen Guo;Guang Wang","doi":"10.1109/TBDATA.2023.3262408","DOIUrl":null,"url":null,"abstract":"On-demand delivery has become prevailing for people to order meals and groceries online, especially during the pandemic. It is essential to dispatch massive orders to limited couriers to satisfy on-demand delivery users, especially during peak hours. Existing studies mainly focus on order dispatching within a region, and they are challenging to be applied to the cross-region courier displacement problem due to (1) unique practical factors, including regional spatial-temporal demand-supply dynamics and strict delivery time constraints, and (2) the large-scale setting and high-dimensional decision space given massive couriers in on-demand delivery. To address these challenges, in this work, we propose an efficient cross-region courier displacement framework, i.e., \n<underline>C\nourier \n<underline>D\nisplacement \n<underline>R\neinforcement \n<underline>L\nearning (short for \n<italic>CDRL\n) based on centralized multi-agent actor-critic, which first design the actor-critic network with a time-varying displacement intensity control module to capture demand-supply dynamics and utilize the centralized training and decentralized execution multi-agent framework to address the large-scale coordination. One-month real-world order records collected from one of the biggest on-demand delivery services in the world are utilized to show the performance of our design. The extensive results show that our method offers a 47.97% of increase in balancing supply and demand and reduces idle ride time by 24.62% simultaneously.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1321-1333"},"PeriodicalIF":7.5000,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10083277/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

Abstract

On-demand delivery has become prevailing for people to order meals and groceries online, especially during the pandemic. It is essential to dispatch massive orders to limited couriers to satisfy on-demand delivery users, especially during peak hours. Existing studies mainly focus on order dispatching within a region, and they are challenging to be applied to the cross-region courier displacement problem due to (1) unique practical factors, including regional spatial-temporal demand-supply dynamics and strict delivery time constraints, and (2) the large-scale setting and high-dimensional decision space given massive couriers in on-demand delivery. To address these challenges, in this work, we propose an efficient cross-region courier displacement framework, i.e., C ourier D isplacement R einforcement L earning (short for CDRL ) based on centralized multi-agent actor-critic, which first design the actor-critic network with a time-varying displacement intensity control module to capture demand-supply dynamics and utilize the centralized training and decentralized execution multi-agent framework to address the large-scale coordination. One-month real-world order records collected from one of the biggest on-demand delivery services in the world are utilized to show the performance of our design. The extensive results show that our method offers a 47.97% of increase in balancing supply and demand and reduces idle ride time by 24.62% simultaneously.

查看原文本刊更多论文

基于多Agent强化学习的按需配送跨区域快递员置换

按需配送已成为人们在网上订餐和订购食品杂货的主流，尤其是在疫情期间。向有限的快递员发送大量订单以满足按需配送用户的需求至关重要，尤其是在高峰时段。现有的研究主要集中在一个区域内的订单调度，由于（1）独特的现实因素，包括区域时空供需动态和严格的交货时间限制，这些研究很难应用于跨区域快递员位移问题，以及（2）在按需递送中给大量快递员的大规模设置和高维决策空间。为了应对这些挑战，在这项工作中，我们提出了一个有效的跨区域信使位移框架，即基于集中式多智能体行动者-批评者的信使位移强化学习（CDRL的缩写），首先设计了具有时变位移强度控制模块的actor-critic网络来捕捉供需动态，并利用集中训练和分散执行的多智能体框架来解决大规模协调问题。从世界上最大的按需配送服务之一收集的一个月的真实订单记录用于显示我们的设计性能。广泛的结果表明，我们的方法在平衡供需方面增加了47.97%，同时减少了24.62%的空转时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.