Ahmed Ali, Ahmad Salah, Mahmoud Bekhit, Ahmed Fathalla
{"title":"Divide-and-train: A new approach to improve the predictive tasks of bike-sharing systems.","authors":"Ahmed Ali, Ahmad Salah, Mahmoud Bekhit, Ahmed Fathalla","doi":"10.3934/mbe.2024282","DOIUrl":null,"url":null,"abstract":"<p><p>Bike-sharing systems (BSSs) have become commonplace in most cities worldwide as an important part of many smart cities. These systems generate a continuous amount of large data volumes. The effectiveness of these BSS systems depends on making decisions at the proper time. Thus, there is a vital need to build predictive models on the BSS data for the sake of improving the process of decision-making. The overwhelming majority of BSS users register before utilizing the service. Thus, several BSSs have prior knowledge of the user's data, such as age, gender, and other relevant details. Several machine learning and deep learning models, for instance, are used to predict urban flows, trip duration, and other factors. The standard practice for these models is to train on the entire dataset to build a predictive model, whereas the biking patterns of various users are intuitively distinct. For instance, the user's age influences the duration of a trip. This endeavor was motivated by the existence of distinct user patterns. In this work, we proposed <i>divide-and-train</i>, a new method for training predictive models on station-based BSS datasets by dividing the original datasets on the values of a given dataset attribute. Then, the proposed method was validated on different machine learning and deep learning models. All employed models were trained on both the complete and split datasets. The enhancements made to the evaluation metric were then reported. Results demonstrated that the proposed method outperformed the conventional training approach. Specifically, the root mean squared error (RMSE) and mean absolute error (MAE) metrics have shown improvements in both trip duration and distance prediction, with an average accuracy of 85% across the divided sub-datasets for the best performing model, i.e., random forest.</p>","PeriodicalId":49870,"journal":{"name":"Mathematical Biosciences and Engineering","volume":"21 7","pages":"6471-6492"},"PeriodicalIF":2.6000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3934/mbe.2024282","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0
Abstract
Bike-sharing systems (BSSs) have become commonplace in most cities worldwide as an important part of many smart cities. These systems generate a continuous amount of large data volumes. The effectiveness of these BSS systems depends on making decisions at the proper time. Thus, there is a vital need to build predictive models on the BSS data for the sake of improving the process of decision-making. The overwhelming majority of BSS users register before utilizing the service. Thus, several BSSs have prior knowledge of the user's data, such as age, gender, and other relevant details. Several machine learning and deep learning models, for instance, are used to predict urban flows, trip duration, and other factors. The standard practice for these models is to train on the entire dataset to build a predictive model, whereas the biking patterns of various users are intuitively distinct. For instance, the user's age influences the duration of a trip. This endeavor was motivated by the existence of distinct user patterns. In this work, we proposed divide-and-train, a new method for training predictive models on station-based BSS datasets by dividing the original datasets on the values of a given dataset attribute. Then, the proposed method was validated on different machine learning and deep learning models. All employed models were trained on both the complete and split datasets. The enhancements made to the evaluation metric were then reported. Results demonstrated that the proposed method outperformed the conventional training approach. Specifically, the root mean squared error (RMSE) and mean absolute error (MAE) metrics have shown improvements in both trip duration and distance prediction, with an average accuracy of 85% across the divided sub-datasets for the best performing model, i.e., random forest.
期刊介绍:
Mathematical Biosciences and Engineering (MBE) is an interdisciplinary Open Access journal promoting cutting-edge research, technology transfer and knowledge translation about complex data and information processing.
MBE publishes Research articles (long and original research); Communications (short and novel research); Expository papers; Technology Transfer and Knowledge Translation reports (description of new technologies and products); Announcements and Industrial Progress and News (announcements and even advertisement, including major conferences).