Pengfei Cui , Mohamed Abdel-Aty , Lei Han , Xiaobao Yang
{"title":"Multiscale geographical random forest: A novel spatial ML approach for traffic safety modeling integrating street-view semantic visual features","authors":"Pengfei Cui , Mohamed Abdel-Aty , Lei Han , Xiaobao Yang","doi":"10.1016/j.trc.2025.105299","DOIUrl":null,"url":null,"abstract":"<div><div>Macro-level traffic safety modeling aims to identify critical risk factors to reginal crashes, providing essential basis for effective countermeasures by traffic managers. Previous work mainly incorporated macro and static socio-demographic and infrastructure features, overlooking drivers’ visual perception of environment, which crucially influences their driving behavior and thus safety. Moreover, spatial machine learning (ML) has gained prominence for its strong crash prediction performance. However, existing spatial ML typically apply spatial effects at a fixed or homogeneous scale (e.g., specific Euclidean distances), limiting their ability to capture the multiscale spatial heterogeneity of features. To address these gaps, emerging image semantic segmentation technique is employed to extract visual environment features (e.g., buildings, trees) from Google Street View (GSV) images. A novel spatial ML method, Multiscale Geographical Random Forest (MGRF), is proposed to overcome fixed-spatial scale constraints to adaptive multiscale spatial modeling. Empirical experiments on Southeast Florida show that the inclusion of visual environment features from 228,352 street view images leads to notably improved crash prediction. Compared to traditional models (e.g., multiscale geographically weighted regression), MGRF fits optimal spatial bandwidths for each sample, achieving improvements of 30.31%, 9.98%, and 5.53% in MSE, MAE, and R<sup>2</sup>, respectively. By incorporating SHapley Additive exPlanations, MGRF identified key risk features for each region and quantified their spatial heterogeneity. The Results reveal that in urban core areas, the proportion of cars in GSV, which reflects road traffic condition, is the most critical feature contributing positively to increase in crashes. In contrast, for suburban regions, lower road density and abundant green spaces are associated with a reduction in crashes. This study highlights the significant potential of integrating street-view semantic visual features with multiscale spatial ML to enhance traffic safety analysis.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"179 ","pages":"Article 105299"},"PeriodicalIF":7.6000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25003031","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Macro-level traffic safety modeling aims to identify critical risk factors to reginal crashes, providing essential basis for effective countermeasures by traffic managers. Previous work mainly incorporated macro and static socio-demographic and infrastructure features, overlooking drivers’ visual perception of environment, which crucially influences their driving behavior and thus safety. Moreover, spatial machine learning (ML) has gained prominence for its strong crash prediction performance. However, existing spatial ML typically apply spatial effects at a fixed or homogeneous scale (e.g., specific Euclidean distances), limiting their ability to capture the multiscale spatial heterogeneity of features. To address these gaps, emerging image semantic segmentation technique is employed to extract visual environment features (e.g., buildings, trees) from Google Street View (GSV) images. A novel spatial ML method, Multiscale Geographical Random Forest (MGRF), is proposed to overcome fixed-spatial scale constraints to adaptive multiscale spatial modeling. Empirical experiments on Southeast Florida show that the inclusion of visual environment features from 228,352 street view images leads to notably improved crash prediction. Compared to traditional models (e.g., multiscale geographically weighted regression), MGRF fits optimal spatial bandwidths for each sample, achieving improvements of 30.31%, 9.98%, and 5.53% in MSE, MAE, and R2, respectively. By incorporating SHapley Additive exPlanations, MGRF identified key risk features for each region and quantified their spatial heterogeneity. The Results reveal that in urban core areas, the proportion of cars in GSV, which reflects road traffic condition, is the most critical feature contributing positively to increase in crashes. In contrast, for suburban regions, lower road density and abundant green spaces are associated with a reduction in crashes. This study highlights the significant potential of integrating street-view semantic visual features with multiscale spatial ML to enhance traffic safety analysis.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.