{"title":"共享单车出行次数的预测:参数空间回归模型与地理加权XGBoost算法的比较","authors":"Katja Schimohr, Philipp Doebler, Joachim Scheiner","doi":"10.1111/gean.12354","DOIUrl":null,"url":null,"abstract":"<p>Regression models are commonly applied in the analysis of transportation data. This research aims at broadening the range of methods used for this task by modeling the spatial distribution of bike-sharing trips in Cologne, Germany, applying both parametric regression models and a modified machine learning approach while incorporating measures to account for spatial autocorrelation. Independent variables included in the models consist of land use types, elements of the transport system and sociodemographic characteristics. Out of several regression models with different underlying distributions, a Tweedie generalized additive model is chosen by its values for AIC, RMSE, and sMAPE to be compared to an XGBoost model. To consider spatial relationships, spatial splines are included in the Tweedie model, while the estimations of the XGBoost model are modified using a geographically weighted regression. Both methods entail certain advantages: while XGBoost leads to far better values regarding RMSE and sMAPE and therefore to a better model fit, the Tweedie model allows an easier interpretation of the influence of the independent variables including spatial effects.</p>","PeriodicalId":12533,"journal":{"name":"Geographical Analysis","volume":"55 4","pages":"651-684"},"PeriodicalIF":3.3000,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/gean.12354","citationCount":"0","resultStr":"{\"title\":\"Prediction of Bike-sharing Trip Counts: Comparing Parametric Spatial Regression Models to a Geographically Weighted XGBoost Algorithm\",\"authors\":\"Katja Schimohr, Philipp Doebler, Joachim Scheiner\",\"doi\":\"10.1111/gean.12354\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Regression models are commonly applied in the analysis of transportation data. This research aims at broadening the range of methods used for this task by modeling the spatial distribution of bike-sharing trips in Cologne, Germany, applying both parametric regression models and a modified machine learning approach while incorporating measures to account for spatial autocorrelation. Independent variables included in the models consist of land use types, elements of the transport system and sociodemographic characteristics. Out of several regression models with different underlying distributions, a Tweedie generalized additive model is chosen by its values for AIC, RMSE, and sMAPE to be compared to an XGBoost model. To consider spatial relationships, spatial splines are included in the Tweedie model, while the estimations of the XGBoost model are modified using a geographically weighted regression. Both methods entail certain advantages: while XGBoost leads to far better values regarding RMSE and sMAPE and therefore to a better model fit, the Tweedie model allows an easier interpretation of the influence of the independent variables including spatial effects.</p>\",\"PeriodicalId\":12533,\"journal\":{\"name\":\"Geographical Analysis\",\"volume\":\"55 4\",\"pages\":\"651-684\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2022-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1111/gean.12354\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geographical Analysis\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/gean.12354\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GEOGRAPHY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geographical Analysis","FirstCategoryId":"89","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/gean.12354","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY","Score":null,"Total":0}
Prediction of Bike-sharing Trip Counts: Comparing Parametric Spatial Regression Models to a Geographically Weighted XGBoost Algorithm
Regression models are commonly applied in the analysis of transportation data. This research aims at broadening the range of methods used for this task by modeling the spatial distribution of bike-sharing trips in Cologne, Germany, applying both parametric regression models and a modified machine learning approach while incorporating measures to account for spatial autocorrelation. Independent variables included in the models consist of land use types, elements of the transport system and sociodemographic characteristics. Out of several regression models with different underlying distributions, a Tweedie generalized additive model is chosen by its values for AIC, RMSE, and sMAPE to be compared to an XGBoost model. To consider spatial relationships, spatial splines are included in the Tweedie model, while the estimations of the XGBoost model are modified using a geographically weighted regression. Both methods entail certain advantages: while XGBoost leads to far better values regarding RMSE and sMAPE and therefore to a better model fit, the Tweedie model allows an easier interpretation of the influence of the independent variables including spatial effects.
期刊介绍:
First in its specialty area and one of the most frequently cited publications in geography, Geographical Analysis has, since 1969, presented significant advances in geographical theory, model building, and quantitative methods to geographers and scholars in a wide spectrum of related fields. Traditionally, mathematical and nonmathematical articulations of geographical theory, and statements and discussions of the analytic paradigm are published in the journal. Spatial data analyses and spatial econometrics and statistics are strongly represented.