Dynamic Bike Reposition: A Spatio-Temporal Reinforcement Learning Approach

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining Pub Date : 2018-07-19 DOI:10.1145/3219819.3220110

Yexin Li, Yu Zheng, Qiang Yang

{"title":"Dynamic Bike Reposition: A Spatio-Temporal Reinforcement Learning Approach","authors":"Yexin Li, Yu Zheng, Qiang Yang","doi":"10.1145/3219819.3220110","DOIUrl":null,"url":null,"abstract":"Bike-sharing systems are widely deployed in many major cities, while the jammed and empty stations in them lead to severe customer loss. Currently, operators try to constantly reposition bikes among stations when the system is operating. However, how to efficiently reposition to minimize the customer loss in a long period remains unsolved. We propose a spatio-temporal reinforcement learning based bike reposition model to deal with this problem. Firstly, an inter-independent inner-balance clustering algorithm is proposed to cluster stations into groups. Clusters obtained have two properties, i.e. each cluster is inner-balanced and independent from the others. As there are many trikes repositioning in a very large system simultaneously, clustering is necessary to reduce the problem complexity. Secondly, we allocate multiple trikes to each cluster to conduct inner-cluster bike reposition. A spatio-temporal reinforcement learning model is designed for each cluster to learn a reposition policy in it, targeting at minimizing its customer loss in a long period. To learn each model, we design a deep neural network to estimate its optimal long-term value function, from which the optimal policy can be easily inferred. Besides formulating the model in a multi-agent way, we further reduce its training complexity by two spatio-temporal pruning rules. Thirdly, we design a system simulator based on two predictors to train and evaluate the reposition model. Experiments on real-world datasets from Citi Bike are conducted to confirm the effectiveness of our model.","PeriodicalId":322066,"journal":{"name":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"93","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3219819.3220110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 93

Abstract

Bike-sharing systems are widely deployed in many major cities, while the jammed and empty stations in them lead to severe customer loss. Currently, operators try to constantly reposition bikes among stations when the system is operating. However, how to efficiently reposition to minimize the customer loss in a long period remains unsolved. We propose a spatio-temporal reinforcement learning based bike reposition model to deal with this problem. Firstly, an inter-independent inner-balance clustering algorithm is proposed to cluster stations into groups. Clusters obtained have two properties, i.e. each cluster is inner-balanced and independent from the others. As there are many trikes repositioning in a very large system simultaneously, clustering is necessary to reduce the problem complexity. Secondly, we allocate multiple trikes to each cluster to conduct inner-cluster bike reposition. A spatio-temporal reinforcement learning model is designed for each cluster to learn a reposition policy in it, targeting at minimizing its customer loss in a long period. To learn each model, we design a deep neural network to estimate its optimal long-term value function, from which the optimal policy can be easily inferred. Besides formulating the model in a multi-agent way, we further reduce its training complexity by two spatio-temporal pruning rules. Thirdly, we design a system simulator based on two predictors to train and evaluate the reposition model. Experiments on real-world datasets from Citi Bike are conducted to confirm the effectiveness of our model.

查看原文本刊更多论文

动态自行车重新定位:一个时空强化学习方法

共享单车系统在许多大城市广泛部署，而其中拥挤和空置的站点导致了严重的客户流失。目前，当系统运行时，运营商试图不断地在站点之间重新放置自行车。然而，如何有效地重新定位，使长期的客户流失最小化，仍然是一个有待解决的问题。为了解决这一问题，我们提出了一种基于时空强化学习的自行车重新定位模型。首先，提出了一种相互独立的内平衡聚类算法，对站点进行分组;得到的簇有两个性质，即每个簇都是内平衡的，并且相互独立。由于在一个非常大的系统中同时存在许多三轴重新定位，因此需要聚类来降低问题的复杂性。其次，在每个集群中分配多辆三轮车，进行集群内自行车的重新定位;为每个集群设计了一个时空强化学习模型，学习其中的重新定位策略，以最小化其长期客户损失为目标。为了学习每个模型，我们设计了一个深度神经网络来估计其最优长期价值函数，从中可以很容易地推断出最优策略。除了以多智能体的方式构建模型外，我们还通过两个时空剪枝规则进一步降低了模型的训练复杂度。第三，我们设计了一个基于两个预测器的系统模拟器来训练和评估重新定位模型。在Citi Bike的真实数据集上进行了实验，以验证我们模型的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

自引率

0.00%

发文量