{"title":"Conditional Generative Adversarial Network-Based roadway crash risk prediction considering heterogeneity with dynamic data","authors":"Nuri Park , Juneyoung Park , Chris Lee","doi":"10.1016/j.jsr.2024.12.001","DOIUrl":null,"url":null,"abstract":"<div><div><em>Introduction</em>: Roadway crash data are very rare and occur randomly, therefore there are several challenges to developing a crash prediction model for real-time traffic safety management. Recently, to resolve the problem of crash data sample size, researchers have conducted studies on crash data augmentation using machine learning techniques for developing safety evaluation models. However, it’s important to incorporate the specific characteristics of crash data into augmentation and crash risk assessment, as these characteristics vary depending on spatial and temporal conditions. <em>Method:</em> Therefore, this study developed a real-time crash risk model in three stages. First, crash data were clustered to define heterogeneous crash risk situations and then, key variables were derived by the ensemble and explainable artificial intelligence techniques, Boruta-SHAP. Second, augmentation of each clustered crash data was performed using oversampling techniques including Conditional Generative Adversarial Network (CGAN), which can consider each crash risk cluster’s characteristics. Finally, crash risk models were developed and compared with other crash risk models developed by using binary logistic regression model (BLM), Random Forest (RF), extreme gradient boosting (XGBoost), and Support Vector Machine (SVM). <em>Results:</em> The results showed that the CGAN-based XGBoost model has the best performance and the variable of the temporal speed difference at 10-minute intervals and the precipitation variable have a large impact on crash risk prediction. This paper emphasizes that crash risk characteristics must be distinguished in crash risk prediction and provides new insights into addressing the imbalance data issue within crash and non-crash datasets.</div></div>","PeriodicalId":48224,"journal":{"name":"Journal of Safety Research","volume":"92 ","pages":"Pages 217-229"},"PeriodicalIF":3.9000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Safety Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022437524002093","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ERGONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Roadway crash data are very rare and occur randomly, therefore there are several challenges to developing a crash prediction model for real-time traffic safety management. Recently, to resolve the problem of crash data sample size, researchers have conducted studies on crash data augmentation using machine learning techniques for developing safety evaluation models. However, it’s important to incorporate the specific characteristics of crash data into augmentation and crash risk assessment, as these characteristics vary depending on spatial and temporal conditions. Method: Therefore, this study developed a real-time crash risk model in three stages. First, crash data were clustered to define heterogeneous crash risk situations and then, key variables were derived by the ensemble and explainable artificial intelligence techniques, Boruta-SHAP. Second, augmentation of each clustered crash data was performed using oversampling techniques including Conditional Generative Adversarial Network (CGAN), which can consider each crash risk cluster’s characteristics. Finally, crash risk models were developed and compared with other crash risk models developed by using binary logistic regression model (BLM), Random Forest (RF), extreme gradient boosting (XGBoost), and Support Vector Machine (SVM). Results: The results showed that the CGAN-based XGBoost model has the best performance and the variable of the temporal speed difference at 10-minute intervals and the precipitation variable have a large impact on crash risk prediction. This paper emphasizes that crash risk characteristics must be distinguished in crash risk prediction and provides new insights into addressing the imbalance data issue within crash and non-crash datasets.
期刊介绍:
Journal of Safety Research is an interdisciplinary publication that provides for the exchange of ideas and scientific evidence capturing studies through research in all areas of safety and health, including traffic, workplace, home, and community. This forum invites research using rigorous methodologies, encourages translational research, and engages the global scientific community through various partnerships (e.g., this outreach includes highlighting some of the latest findings from the U.S. Centers for Disease Control and Prevention).