Divide-and-train: A new approach to improve the predictive tasks of bike-sharing systems.

IF 2.6 4区工程技术 Q1 Mathematics

Mathematical Biosciences and Engineering Pub Date : 2024-07-02 DOI:10.3934/mbe.2024282

Ahmed Ali, Ahmad Salah, Mahmoud Bekhit, Ahmed Fathalla

{"title":"Divide-and-train: A new approach to improve the predictive tasks of bike-sharing systems.","authors":"Ahmed Ali, Ahmad Salah, Mahmoud Bekhit, Ahmed Fathalla","doi":"10.3934/mbe.2024282","DOIUrl":null,"url":null,"abstract":"Bike-sharing systems (BSSs) have become commonplace in most cities worldwide as an important part of many smart cities. These systems generate a continuous amount of large data volumes. The effectiveness of these BSS systems depends on making decisions at the proper time. Thus, there is a vital need to build predictive models on the BSS data for the sake of improving the process of decision-making. The overwhelming majority of BSS users register before utilizing the service. Thus, several BSSs have prior knowledge of the user's data, such as age, gender, and other relevant details. Several machine learning and deep learning models, for instance, are used to predict urban flows, trip duration, and other factors. The standard practice for these models is to train on the entire dataset to build a predictive model, whereas the biking patterns of various users are intuitively distinct. For instance, the user's age influences the duration of a trip. This endeavor was motivated by the existence of distinct user patterns. In this work, we proposed divide-and-train, a new method for training predictive models on station-based BSS datasets by dividing the original datasets on the values of a given dataset attribute. Then, the proposed method was validated on different machine learning and deep learning models. All employed models were trained on both the complete and split datasets. The enhancements made to the evaluation metric were then reported. Results demonstrated that the proposed method outperformed the conventional training approach. Specifically, the root mean squared error (RMSE) and mean absolute error (MAE) metrics have shown improvements in both trip duration and distance prediction, with an average accuracy of 85% across the divided sub-datasets for the best performing model, i.e., random forest.","PeriodicalId":49870,"journal":{"name":"Mathematical Biosciences and Engineering","volume":"21 7","pages":"6471-6492"},"PeriodicalIF":2.6000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3934/mbe.2024282","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

Abstract

Bike-sharing systems (BSSs) have become commonplace in most cities worldwide as an important part of many smart cities. These systems generate a continuous amount of large data volumes. The effectiveness of these BSS systems depends on making decisions at the proper time. Thus, there is a vital need to build predictive models on the BSS data for the sake of improving the process of decision-making. The overwhelming majority of BSS users register before utilizing the service. Thus, several BSSs have prior knowledge of the user's data, such as age, gender, and other relevant details. Several machine learning and deep learning models, for instance, are used to predict urban flows, trip duration, and other factors. The standard practice for these models is to train on the entire dataset to build a predictive model, whereas the biking patterns of various users are intuitively distinct. For instance, the user's age influences the duration of a trip. This endeavor was motivated by the existence of distinct user patterns. In this work, we proposed divide-and-train, a new method for training predictive models on station-based BSS datasets by dividing the original datasets on the values of a given dataset attribute. Then, the proposed method was validated on different machine learning and deep learning models. All employed models were trained on both the complete and split datasets. The enhancements made to the evaluation metric were then reported. Results demonstrated that the proposed method outperformed the conventional training approach. Specifically, the root mean squared error (RMSE) and mean absolute error (MAE) metrics have shown improvements in both trip duration and distance prediction, with an average accuracy of 85% across the divided sub-datasets for the best performing model, i.e., random forest.

查看原文本刊更多论文

分而治之：改进共享单车系统预测任务的新方法。

共享单车系统（BSS）作为许多智慧城市的重要组成部分，已在全球大多数城市普及。这些系统不断产生大量数据。这些共享单车系统的有效性取决于能否在适当的时间做出决策。因此，亟需在 BSS 数据上建立预测模型，以改进决策过程。绝大多数 BSS 用户都是在使用服务前注册的。因此，一些 BSS 事先掌握了用户的数据，如年龄、性别和其他相关细节。例如，一些机器学习和深度学习模型被用于预测城市人流、行程持续时间和其他因素。这些模型的标准做法是对整个数据集进行训练，以建立预测模型，而不同用户的骑车模式直观上是截然不同的。例如，用户的年龄会影响出行的持续时间。由于存在不同的用户模式，因此我们开始了这项工作。在这项工作中，我们提出了 "分割-训练"（divided-and-train）方法，这是一种在基于站点的 BSS 数据集上训练预测模型的新方法，方法是根据给定数据集属性值对原始数据集进行分割。然后，在不同的机器学习和深度学习模型上对所提出的方法进行了验证。所有采用的模型都在完整数据集和分割数据集上进行了训练。然后报告了对评估指标的改进。结果表明，所提出的方法优于传统的训练方法。具体来说，均方根误差（RMSE）和平均绝对误差（MAE）指标在行程持续时间和距离预测方面都有所改进，其中表现最好的模型（即随机森林）在分割后的子数据集上的平均准确率为 85%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mathematical Biosciences and Engineering 工程技术-数学跨学科应用

CiteScore

3.90

自引率

7.70%

发文量

586

审稿时长

>12 weeks

期刊介绍： Mathematical Biosciences and Engineering (MBE) is an interdisciplinary Open Access journal promoting cutting-edge research, technology transfer and knowledge translation about complex data and information processing. MBE publishes Research articles (long and original research); Communications (short and novel research); Expository papers; Technology Transfer and Knowledge Translation reports (description of new technologies and products); Announcements and Industrial Progress and News (announcements and even advertisement, including major conferences).