Multiscale geographical random forest: A novel spatial ML approach for traffic safety modeling integrating street-view semantic visual features

IF 7.6 1区工程技术 Q1 TRANSPORTATION SCIENCE & TECHNOLOGY

Transportation Research Part C-Emerging Technologies Pub Date : 2025-08-22 DOI:10.1016/j.trc.2025.105299

Pengfei Cui , Mohamed Abdel-Aty , Lei Han , Xiaobao Yang

{"title":"Multiscale geographical random forest: A novel spatial ML approach for traffic safety modeling integrating street-view semantic visual features","authors":"Pengfei Cui , Mohamed Abdel-Aty , Lei Han , Xiaobao Yang","doi":"10.1016/j.trc.2025.105299","DOIUrl":null,"url":null,"abstract":"<div><div>Macro-level traffic safety modeling aims to identify critical risk factors to reginal crashes, providing essential basis for effective countermeasures by traffic managers. Previous work mainly incorporated macro and static socio-demographic and infrastructure features, overlooking drivers’ visual perception of environment, which crucially influences their driving behavior and thus safety. Moreover, spatial machine learning (ML) has gained prominence for its strong crash prediction performance. However, existing spatial ML typically apply spatial effects at a fixed or homogeneous scale (e.g., specific Euclidean distances), limiting their ability to capture the multiscale spatial heterogeneity of features. To address these gaps, emerging image semantic segmentation technique is employed to extract visual environment features (e.g., buildings, trees) from Google Street View (GSV) images. A novel spatial ML method, Multiscale Geographical Random Forest (MGRF), is proposed to overcome fixed-spatial scale constraints to adaptive multiscale spatial modeling. Empirical experiments on Southeast Florida show that the inclusion of visual environment features from 228,352 street view images leads to notably improved crash prediction. Compared to traditional models (e.g., multiscale geographically weighted regression), MGRF fits optimal spatial bandwidths for each sample, achieving improvements of 30.31%, 9.98%, and 5.53% in MSE, MAE, and R<sup>2</sup>, respectively. By incorporating SHapley Additive exPlanations, MGRF identified key risk features for each region and quantified their spatial heterogeneity. The Results reveal that in urban core areas, the proportion of cars in GSV, which reflects road traffic condition, is the most critical feature contributing positively to increase in crashes. In contrast, for suburban regions, lower road density and abundant green spaces are associated with a reduction in crashes. This study highlights the significant potential of integrating street-view semantic visual features with multiscale spatial ML to enhance traffic safety analysis.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"179 ","pages":"Article 105299"},"PeriodicalIF":7.6000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25003031","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Macro-level traffic safety modeling aims to identify critical risk factors to reginal crashes, providing essential basis for effective countermeasures by traffic managers. Previous work mainly incorporated macro and static socio-demographic and infrastructure features, overlooking drivers’ visual perception of environment, which crucially influences their driving behavior and thus safety. Moreover, spatial machine learning (ML) has gained prominence for its strong crash prediction performance. However, existing spatial ML typically apply spatial effects at a fixed or homogeneous scale (e.g., specific Euclidean distances), limiting their ability to capture the multiscale spatial heterogeneity of features. To address these gaps, emerging image semantic segmentation technique is employed to extract visual environment features (e.g., buildings, trees) from Google Street View (GSV) images. A novel spatial ML method, Multiscale Geographical Random Forest (MGRF), is proposed to overcome fixed-spatial scale constraints to adaptive multiscale spatial modeling. Empirical experiments on Southeast Florida show that the inclusion of visual environment features from 228,352 street view images leads to notably improved crash prediction. Compared to traditional models (e.g., multiscale geographically weighted regression), MGRF fits optimal spatial bandwidths for each sample, achieving improvements of 30.31%, 9.98%, and 5.53% in MSE, MAE, and R², respectively. By incorporating SHapley Additive exPlanations, MGRF identified key risk features for each region and quantified their spatial heterogeneity. The Results reveal that in urban core areas, the proportion of cars in GSV, which reflects road traffic condition, is the most critical feature contributing positively to increase in crashes. In contrast, for suburban regions, lower road density and abundant green spaces are associated with a reduction in crashes. This study highlights the significant potential of integrating street-view semantic visual features with multiscale spatial ML to enhance traffic safety analysis.

查看原文本刊更多论文

多尺度地理随机森林：一种集成街景语义视觉特征的交通安全建模新空间ML方法

宏观层面的交通安全建模旨在识别区域交通事故的关键风险因素，为交通管理者采取有效对策提供必要依据。以往的工作主要是将宏观和静态的社会人口和基础设施特征纳入其中，忽略了驾驶员对环境的视觉感知，而视觉感知对驾驶员的驾驶行为和安全有着至关重要的影响。此外，空间机器学习（ML）因其强大的碰撞预测性能而备受关注。然而，现有的空间机器学习通常在固定或均匀的尺度上应用空间效果（例如，特定的欧几里得距离），限制了它们捕捉特征的多尺度空间异质性的能力。为了解决这些差距，采用新兴的图像语义分割技术从谷歌街景（GSV）图像中提取视觉环境特征（如建筑物、树木）。多尺度地理随机森林（MGRF）是一种克服固定空间尺度约束的自适应多尺度空间建模方法。在佛罗里达东南部进行的实证实验表明，包含228,352张街景图像的视觉环境特征可以显著提高碰撞预测。与传统模型（如多尺度地理加权回归）相比，MGRF对每个样本的空间带宽进行了最优拟合，MSE、MAE和R2分别提高了30.31%、9.98%和5.53%。通过结合SHapley加性解释，MGRF确定了每个区域的关键风险特征，并量化了它们的空间异质性。结果表明，在城市核心区，反映道路交通状况的城市交通车辆占比是影响交通事故增加的最关键特征。相比之下，在郊区，较低的道路密度和丰富的绿地与撞车事故的减少有关。该研究强调了将街景语义视觉特征与多尺度空间机器学习相结合以增强交通安全分析的巨大潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transportation Research Part C-Emerging Technologies 工程技术-运输科技

CiteScore

15.80

自引率

12.00%

发文量

332

审稿时长

64 days

期刊介绍： Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.