Efficient and Effective Express via Contextual Cooperative Reinforcement Learning

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2019-07-25 DOI:10.1145/3292500.3330968

Yexin Li, Yu Zheng, Qiang Yang

{"title":"Efficient and Effective Express via Contextual Cooperative Reinforcement Learning","authors":"Yexin Li, Yu Zheng, Qiang Yang","doi":"10.1145/3292500.3330968","DOIUrl":null,"url":null,"abstract":"Express systems are widely deployed in many major cities. Couriers in an express system load parcels at transit station and deliver them to customers. Meanwhile, they also try to serve the pick-up requests which come stochastically in real time during the delivery process. Having brought much convenience and promoted the development of e-commerce, express systems face challenges on courier management to complete the massive number of tasks per day. Considering this problem, we propose a reinforcement learning based framework to learn a courier management policy. Firstly, we divide the city into independent regions, in each of which a constant number of couriers deliver parcels and serve requests cooperatively. Secondly, we propose a soft-label clustering algorithm named Balanced Delivery-Service Burden (BDSB) to dispatch parcels to couriers in each region. BDSB guarantees that each courier has almost even delivery and expected request-service burden when departing from transit station, giving a reasonable initialization for online management later. As pick-up requests come in real time, a Contextual Cooperative Reinforcement Learning (CCRL) model is proposed to guide where should each courier deliver and serve in each short period. Being formulated in a multi-agent way, CCRL focuses on the cooperation among couriers while also considering the system context. Experiments on real-world data from Beijing are conducted to confirm the outperformance of our model.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3292500.3330968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Express systems are widely deployed in many major cities. Couriers in an express system load parcels at transit station and deliver them to customers. Meanwhile, they also try to serve the pick-up requests which come stochastically in real time during the delivery process. Having brought much convenience and promoted the development of e-commerce, express systems face challenges on courier management to complete the massive number of tasks per day. Considering this problem, we propose a reinforcement learning based framework to learn a courier management policy. Firstly, we divide the city into independent regions, in each of which a constant number of couriers deliver parcels and serve requests cooperatively. Secondly, we propose a soft-label clustering algorithm named Balanced Delivery-Service Burden (BDSB) to dispatch parcels to couriers in each region. BDSB guarantees that each courier has almost even delivery and expected request-service burden when departing from transit station, giving a reasonable initialization for online management later. As pick-up requests come in real time, a Contextual Cooperative Reinforcement Learning (CCRL) model is proposed to guide where should each courier deliver and serve in each short period. Being formulated in a multi-agent way, CCRL focuses on the cooperation among couriers while also considering the system context. Experiments on real-world data from Beijing are conducted to confirm the outperformance of our model.

查看原文本刊更多论文

基于上下文合作强化学习的高效表达

许多大城市都广泛部署了快车系统。快递系统中的快递员在中转站装载包裹并将它们送到客户手中。与此同时，他们也试图在送货过程中实时处理随机出现的提货请求。快递系统在为电子商务的发展带来诸多便利的同时，也面临着快递管理方面的挑战。考虑到这个问题，我们提出了一个基于强化学习的框架来学习快递管理策略。首先，我们将城市划分为独立的区域，每个区域有一定数量的快递员合作投递包裹和服务请求。其次，我们提出了一种名为平衡配送服务负担(BDSB)的软标签聚类算法，将包裹分配给各个地区的快递员。BDSB保证每个快递员从中转站出发时的派送和预期的请求服务负担几乎相等，为后期的在线管理提供了合理的初始化。由于取件请求是实时的，提出了一种上下文合作强化学习(CCRL)模型来指导每个快递员在每个短时间内应该在哪里投递和服务。CCRL以多智能体的方式制定，在考虑系统环境的同时注重快递员之间的合作。对来自北京的真实数据进行了实验，以证实我们的模型的优异性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

自引率

0.00%

发文量