Using traffic data to identify land-use characteristics based on ensemble learning approaches

IF 2.2 4区工程技术 Q4 TRANSPORTATION

Journal of Transport and Land Use Pub Date : 2023-01-13 DOI:10.5198/jtlu.2023.2218

Jiahui Zhao, Zhibin Li, Pan-xue Liu

{"title":"Using traffic data to identify land-use characteristics based on ensemble learning approaches","authors":"Jiahui Zhao, Zhibin Li, Pan-xue Liu","doi":"10.5198/jtlu.2023.2218","DOIUrl":null,"url":null,"abstract":"The land-use identification process, which involves quantifying the types and intensity of human activities at a regional level, is a critical investigation step for ongoing land-use planning. One limitation of land-use identification practices is that they are based on theoretical-driven models using survey and socioeconomic data, which are often considered costly and time consuming. Another limitation is that most of these identification methods cannot incorporate the effect of daily human activity, resulting in some significant spatial heterogeneity being ignored. In this context, a novel land-use identification framework is proposed to quantify land-use characteristics using traffic-flow and traffic-events data. Regarding the identification models, two widely used Ensemble learning methods: Random Forest and Adaboost, are introduced to classify the land-use type and fit the land-use density. The case study collected the transit vehicle positions, traffic events, and geo-tagged data at the regional level in the San Francisco Bay Area, California. The results demonstrated that this framework with Ensemble learning was significantly accurate at identifying land-use characteristics in both the type classification and density regression tasks. The result averages improved 12.63%, 12.84%, 11.05%, 5.44%, 12.84% for Area Under ROC Curve (AUC), Classification Accuracy (CA), F-Measure (F1), Precision, and Recall, respectively, in classification tasks and 56.81%, 21.20%, 47.29% for Mean Squared Error (MSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), respectively, in regression tasks than other models. The Random Forest model performs better in labels with high regularity, such as education, residence, and work activities. Apart from the accuracy, the correlation analysis of the error term also showed that the result was consistent with people’s common sense of land-use characteristics, demonstrating the interpretability of the proposed framework.","PeriodicalId":47271,"journal":{"name":"Journal of Transport and Land Use","volume":" ","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Transport and Land Use","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.5198/jtlu.2023.2218","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"TRANSPORTATION","Score":null,"Total":0}

引用次数: 2

Abstract

The land-use identification process, which involves quantifying the types and intensity of human activities at a regional level, is a critical investigation step for ongoing land-use planning. One limitation of land-use identification practices is that they are based on theoretical-driven models using survey and socioeconomic data, which are often considered costly and time consuming. Another limitation is that most of these identification methods cannot incorporate the effect of daily human activity, resulting in some significant spatial heterogeneity being ignored. In this context, a novel land-use identification framework is proposed to quantify land-use characteristics using traffic-flow and traffic-events data. Regarding the identification models, two widely used Ensemble learning methods: Random Forest and Adaboost, are introduced to classify the land-use type and fit the land-use density. The case study collected the transit vehicle positions, traffic events, and geo-tagged data at the regional level in the San Francisco Bay Area, California. The results demonstrated that this framework with Ensemble learning was significantly accurate at identifying land-use characteristics in both the type classification and density regression tasks. The result averages improved 12.63%, 12.84%, 11.05%, 5.44%, 12.84% for Area Under ROC Curve (AUC), Classification Accuracy (CA), F-Measure (F1), Precision, and Recall, respectively, in classification tasks and 56.81%, 21.20%, 47.29% for Mean Squared Error (MSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE), respectively, in regression tasks than other models. The Random Forest model performs better in labels with high regularity, such as education, residence, and work activities. Apart from the accuracy, the correlation analysis of the error term also showed that the result was consistent with people’s common sense of land-use characteristics, demonstrating the interpretability of the proposed framework.

查看原文本刊更多论文

基于集成学习方法的交通数据土地利用特征识别

土地利用识别过程涉及在区域一级量化人类活动的类型和强度，是正在进行的土地利用规划的关键调查步骤。土地利用识别实践的一个局限性是，它们基于使用调查和社会经济数据的理论驱动模型，这些数据通常被认为成本高昂且耗时。另一个局限性是，大多数识别方法无法结合日常人类活动的影响，导致一些显著的空间异质性被忽视。在此背景下，提出了一种新的土地利用识别框架，利用交通流和交通事件数据量化土地利用特征。在识别模型方面，引入了两种广泛使用的集成学习方法：随机森林和Adaboost，对土地利用类型进行分类并拟合土地利用密度。案例研究收集了加利福尼亚州旧金山湾区区域一级的过境车辆位置、交通事件和地理标记数据。结果表明，在类型分类和密度回归任务中，该集成学习框架在识别土地利用特征方面非常准确。与其他模型相比，在分类任务中，ROC曲线下面积（AUC）、分类准确度（CA）、F-Measure（F1）、精度和召回率的结果平均值分别提高了12.63%、12.84%、11.05%、5.44%和12.84%，在回归任务中，均方误差（MSE）、均方根误差（RMSE）和均绝对误差（MAE）分别提高了56.81%、21.20%和47.29%。随机森林模型在具有高度规律性的标签中表现更好，例如教育、居住和工作活动。除了准确性之外，误差项的相关性分析还表明，该结果符合人们对土地利用特征的常识，证明了所提出的框架的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Transport and Land Use TRANSPORTATION-

CiteScore

3.40

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： The Journal of Transport and Land Usepublishes original interdisciplinary papers on the interaction of transport and land use. Domains include: engineering, planning, modeling, behavior, economics, geography, regional science, sociology, architecture and design, network science, and complex systems. Papers reporting innovative methodologies, original data, and new empirical findings are especially encouraged.