Baoshen Guo, Shuai Wang, Yi Ding, Guang Wang, Suining He, Desheng Zhang, Tian He
{"title":"基于时间约束的Actor-Critic强化学习的即时交付并发订单调度","authors":"Baoshen Guo, Shuai Wang, Yi Ding, Guang Wang, Suining He, Desheng Zhang, Tian He","doi":"10.1109/rtss52674.2021.00026","DOIUrl":null,"url":null,"abstract":"Instant delivery has developed rapidly in recent years and significantly changed the lifestyle of people due to its timeliness and convenience. In instant delivery, the order dispatch process is concurrent. Couriers take new orders continuously and deliver multiple orders in a delivery trip (i.e., a batch). The delivery time of orders in a batch is often overlapped and interlinked with each other. The pickup and delivery sequence of the existing orders in a batch changes dynamically due to time constraints and real-time overdue possibility (i.e., the rate of deliveries that are not finished in promised time). Most of existing order dispatch mechanisms are designed for independent order dispatch or concurrent delivery without strict time constraints, hence are incapable of handling real-time concurrent dispatch with strict time constraints in on-demand instant delivery.To address the challenge, we propose a Time-Constrained Actor-Critic Reinforcement learning based concurrent dispatch system called TCAC-Dispatch to enhance the long-term overall revenue and reduce the overdue rate. Specifically, we design a deep matching network (DMN) with a variable action space, which integrates the state embedding (including route behaviors encoding) and actions embedding features into a long-term matching value. Then the Actor-Critic model tackles the concurrent order dispatch problem considering strict time constraints and stochastic demand-supply in instant delivery. An estimated-time based action pruning module is designed to ensure time constraints guarantee and accelerate the training as well as dispatching processes. We evaluate the TCAC-Dispatch with one-month data involved with 36.48 million orders and 42,000 couriers collected from one of the largest instant delivery companies in China, i.e., Eleme. Empirical experiments are conducted on a data-driven emulator deployed on the development environment of Eleme and results show that our method achieves 22% of the increase in total revenue and reduces the overdue rate by 21.6%.","PeriodicalId":102789,"journal":{"name":"2021 IEEE Real-Time Systems Symposium (RTSS)","volume":"9 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Concurrent Order Dispatch for Instant Delivery with Time-Constrained Actor-Critic Reinforcement Learning\",\"authors\":\"Baoshen Guo, Shuai Wang, Yi Ding, Guang Wang, Suining He, Desheng Zhang, Tian He\",\"doi\":\"10.1109/rtss52674.2021.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Instant delivery has developed rapidly in recent years and significantly changed the lifestyle of people due to its timeliness and convenience. In instant delivery, the order dispatch process is concurrent. Couriers take new orders continuously and deliver multiple orders in a delivery trip (i.e., a batch). The delivery time of orders in a batch is often overlapped and interlinked with each other. The pickup and delivery sequence of the existing orders in a batch changes dynamically due to time constraints and real-time overdue possibility (i.e., the rate of deliveries that are not finished in promised time). Most of existing order dispatch mechanisms are designed for independent order dispatch or concurrent delivery without strict time constraints, hence are incapable of handling real-time concurrent dispatch with strict time constraints in on-demand instant delivery.To address the challenge, we propose a Time-Constrained Actor-Critic Reinforcement learning based concurrent dispatch system called TCAC-Dispatch to enhance the long-term overall revenue and reduce the overdue rate. Specifically, we design a deep matching network (DMN) with a variable action space, which integrates the state embedding (including route behaviors encoding) and actions embedding features into a long-term matching value. Then the Actor-Critic model tackles the concurrent order dispatch problem considering strict time constraints and stochastic demand-supply in instant delivery. An estimated-time based action pruning module is designed to ensure time constraints guarantee and accelerate the training as well as dispatching processes. We evaluate the TCAC-Dispatch with one-month data involved with 36.48 million orders and 42,000 couriers collected from one of the largest instant delivery companies in China, i.e., Eleme. Empirical experiments are conducted on a data-driven emulator deployed on the development environment of Eleme and results show that our method achieves 22% of the increase in total revenue and reduces the overdue rate by 21.6%.\",\"PeriodicalId\":102789,\"journal\":{\"name\":\"2021 IEEE Real-Time Systems Symposium (RTSS)\",\"volume\":\"9 12\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE Real-Time Systems Symposium (RTSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/rtss52674.2021.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Real-Time Systems Symposium (RTSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/rtss52674.2021.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Concurrent Order Dispatch for Instant Delivery with Time-Constrained Actor-Critic Reinforcement Learning
Instant delivery has developed rapidly in recent years and significantly changed the lifestyle of people due to its timeliness and convenience. In instant delivery, the order dispatch process is concurrent. Couriers take new orders continuously and deliver multiple orders in a delivery trip (i.e., a batch). The delivery time of orders in a batch is often overlapped and interlinked with each other. The pickup and delivery sequence of the existing orders in a batch changes dynamically due to time constraints and real-time overdue possibility (i.e., the rate of deliveries that are not finished in promised time). Most of existing order dispatch mechanisms are designed for independent order dispatch or concurrent delivery without strict time constraints, hence are incapable of handling real-time concurrent dispatch with strict time constraints in on-demand instant delivery.To address the challenge, we propose a Time-Constrained Actor-Critic Reinforcement learning based concurrent dispatch system called TCAC-Dispatch to enhance the long-term overall revenue and reduce the overdue rate. Specifically, we design a deep matching network (DMN) with a variable action space, which integrates the state embedding (including route behaviors encoding) and actions embedding features into a long-term matching value. Then the Actor-Critic model tackles the concurrent order dispatch problem considering strict time constraints and stochastic demand-supply in instant delivery. An estimated-time based action pruning module is designed to ensure time constraints guarantee and accelerate the training as well as dispatching processes. We evaluate the TCAC-Dispatch with one-month data involved with 36.48 million orders and 42,000 couriers collected from one of the largest instant delivery companies in China, i.e., Eleme. Empirical experiments are conducted on a data-driven emulator deployed on the development environment of Eleme and results show that our method achieves 22% of the increase in total revenue and reduces the overdue rate by 21.6%.