Antonella Scarano , Matin Sadeghi , Filomena Mauriello , Maria Rella Riccardi , Kayvan Aghabayk , Alfonso Montella
{"title":"Cyclist crash severity modeling: A hybrid approach of XGBoost-SHAP and random parameters logit with heterogeneity in means and variances","authors":"Antonella Scarano , Matin Sadeghi , Filomena Mauriello , Maria Rella Riccardi , Kayvan Aghabayk , Alfonso Montella","doi":"10.1016/j.jsr.2025.04.003","DOIUrl":null,"url":null,"abstract":"<div><div><em>Introduction:</em> Across the globe, policymakers are focusing on boosting sustainable transport options, notably cycling, to foster eco-friendly urban environments. However, the persistent safety challenges cyclists face continues to hinder these efforts. <em>Method</em>: This research explores a novel hybrid methodology to investigate the determinants of cyclist crash severity by combining eXtreme Gradient Boosting (XGBoost) with SHapley Additive exPlanations (SHAP) and a random parameters logit model with heterogeneity in means and variances (RPLHMV). Using crash data from the Department for Transport covering crashes in Great Britain from 2016 to 2019, the research evaluates the methodology’s effectiveness. The XGBoost-SHAP model reduced data dimensionality allowing the application of a robust statistical model, while the random parameters logit model with heterogeneity in means and variances captured heterogeneity in both means and variances. <em>Results</em>: The statistical model identified 10 significant variables with fixed parameters for the fatal crashes, 22 significant variables for the serious injuries, and two indicator variables such as cyclist age ≤ 17 and overtaking as a manoeuvre for the second vehicle with statistically significant random parameters associated with serious injury outcomes. The relationships revealed by the logit framework were further examined using the XGBoost-SHAP, which provided deeper insights into the interactions between random and fixed parameters. The use of the hybrid approach allowed to achieve a very good R2 McFadden value of 0.52 for the RPLHMV, demonstrating the model’s robustness. <em>Conclusions</em>: The hybrid approach not only provides a deeper understanding of crash severity dynamics but also helps in creating specific safety measures. <em>Practical applications</em>: This research can guide policymakers in identifying key factors and interactions that affect crash severity, leading to targeted safety improvements.</div></div>","PeriodicalId":48224,"journal":{"name":"Journal of Safety Research","volume":"93 ","pages":"Pages 373-398"},"PeriodicalIF":3.9000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Safety Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022437525000611","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ERGONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Across the globe, policymakers are focusing on boosting sustainable transport options, notably cycling, to foster eco-friendly urban environments. However, the persistent safety challenges cyclists face continues to hinder these efforts. Method: This research explores a novel hybrid methodology to investigate the determinants of cyclist crash severity by combining eXtreme Gradient Boosting (XGBoost) with SHapley Additive exPlanations (SHAP) and a random parameters logit model with heterogeneity in means and variances (RPLHMV). Using crash data from the Department for Transport covering crashes in Great Britain from 2016 to 2019, the research evaluates the methodology’s effectiveness. The XGBoost-SHAP model reduced data dimensionality allowing the application of a robust statistical model, while the random parameters logit model with heterogeneity in means and variances captured heterogeneity in both means and variances. Results: The statistical model identified 10 significant variables with fixed parameters for the fatal crashes, 22 significant variables for the serious injuries, and two indicator variables such as cyclist age ≤ 17 and overtaking as a manoeuvre for the second vehicle with statistically significant random parameters associated with serious injury outcomes. The relationships revealed by the logit framework were further examined using the XGBoost-SHAP, which provided deeper insights into the interactions between random and fixed parameters. The use of the hybrid approach allowed to achieve a very good R2 McFadden value of 0.52 for the RPLHMV, demonstrating the model’s robustness. Conclusions: The hybrid approach not only provides a deeper understanding of crash severity dynamics but also helps in creating specific safety measures. Practical applications: This research can guide policymakers in identifying key factors and interactions that affect crash severity, leading to targeted safety improvements.
期刊介绍:
Journal of Safety Research is an interdisciplinary publication that provides for the exchange of ideas and scientific evidence capturing studies through research in all areas of safety and health, including traffic, workplace, home, and community. This forum invites research using rigorous methodologies, encourages translational research, and engages the global scientific community through various partnerships (e.g., this outreach includes highlighting some of the latest findings from the U.S. Centers for Disease Control and Prevention).