Multi-Objective Distributional Reinforcement Learning for Large-Scale Order Dispatching

2021 IEEE International Conference on Data Mining (ICDM) Pub Date : 2021-12-01 DOI:10.1109/ICDM51629.2021.00202

Fan Zhou, Chenfan Lu, Xiaocheng Tang, Fan Zhang, Zhiwei Qin, Jieping Ye, Hongtu Zhu

引用次数: 3

Abstract

The aim of this paper is to develop a multi-objective distributional reinforcement learning framework for improving order dispatching on large-scale ride-hailing platforms. Compared with traditional RL-based approaches that focus on drivers’ income, the proposed framework also accounts for the spatiotemporal difference between the supply and demand networks. Specifically, we model the dispatching problem as a two-objective Semi-Markov Decision Process (SMDP) and estimate the relative importance of the two objectives under some unknown existing policy via Inverse Reinforcement Learning (IRL). Then, we combine Implicit Quantile Networks (IQN) with the traditional Deep Q-Networks (DQN) to jointly learn the two return distributions and adjusting their weights to refine the old policy through on-line planning and achieve a higher supply-demand coherence of the platform. We conduct large-scale dispatching experiments to demonstrate the remarkable improvement of proposed approach on the platform’s efficiency.

查看原文本刊更多论文

大规模订单调度的多目标分布式强化学习

本文的目的是开发一个多目标分布式强化学习框架，以改善大规模网约车平台的订单调度。与传统的关注司机收入的基于rl的方法相比，该框架还考虑了供需网络之间的时空差异。具体来说，我们将调度问题建模为一个双目标半马尔可夫决策过程(SMDP)，并通过逆强化学习(IRL)估计在某些未知的现有策略下两个目标的相对重要性。然后，我们将隐式分位数网络(IQN)与传统的深度q -网络(DQN)相结合，共同学习两种回报分布并调整其权重，通过在线规划来细化旧策略，实现平台更高的供需一致性。我们进行了大规模调度实验，证明了该方法对平台效率的显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量