Yiping Wu, Hongpeng Zhang, Peng Song, Xiaoheng Sun, Ji Meng, Jun Ma, Liwei Gao
{"title":"Developing an XGBoost based model to predict the probability of truck crashes driven by macro operation and insurance data.","authors":"Yiping Wu, Hongpeng Zhang, Peng Song, Xiaoheng Sun, Ji Meng, Jun Ma, Liwei Gao","doi":"10.1080/15389588.2025.2545002","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Truck accidents caused significant casualties usually. Establishing a scientific truck accident prediction model and identifying the primary causes are crucial for proactive accident prevention.</p><p><strong>Methods: </strong>The proposed model was developed using annual operational behavior data and corresponding insurance claim information from commercial trucks. Prior to model training, multicollinearity among predictor variables was addressed to ensure model interpretability and stability. Model performance was evaluated using recall, F1 score, and overall prediction accuracy, including external validation with a temporally separated dataset from the same driver population. To reduce input data dependency, an input dimensionality reduction analysis was conducted to determine the minimal data requirements. SHAP (Shapley Additive Explanations) values and principal component coefficients were employed to extract the main factors influencing truck accidents.</p><p><strong>Results: </strong>The truck accident prediction model with a recall rate of 84.21% and an F1 score of 85.33%. The prediction accuracy of our developed model reached 87.59% when using new data from the same group of truckers in the subsequent year for validation. Additionally, the minimum data requirement set for our developed model was found to be the feature combination of load capacity, road segment type, and driving time, through analyzing the relationship between model prediction accuracy and feature inputs with different dimensions. Based on the suggested model inputs, the recall rate and F1 score of the prediction model are 86.84% and 84.62%, respectively. The main influencing factors analyzed by SHAP values and Principal Component Analysis (PCA) coefficients indicated that the trucker's familiarity with the road and the type of road segment significantly impact the probability of accident occurrence.</p><p><strong>Conclusions: </strong>This research innovatively establishes a macro data-driven truck accident prediction model alleviating the difficulty of data collection as well as guaranteeing the prediction accuracy.</p>","PeriodicalId":54422,"journal":{"name":"Traffic Injury Prevention","volume":" ","pages":"1-10"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Traffic Injury Prevention","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/15389588.2025.2545002","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Truck accidents caused significant casualties usually. Establishing a scientific truck accident prediction model and identifying the primary causes are crucial for proactive accident prevention.
Methods: The proposed model was developed using annual operational behavior data and corresponding insurance claim information from commercial trucks. Prior to model training, multicollinearity among predictor variables was addressed to ensure model interpretability and stability. Model performance was evaluated using recall, F1 score, and overall prediction accuracy, including external validation with a temporally separated dataset from the same driver population. To reduce input data dependency, an input dimensionality reduction analysis was conducted to determine the minimal data requirements. SHAP (Shapley Additive Explanations) values and principal component coefficients were employed to extract the main factors influencing truck accidents.
Results: The truck accident prediction model with a recall rate of 84.21% and an F1 score of 85.33%. The prediction accuracy of our developed model reached 87.59% when using new data from the same group of truckers in the subsequent year for validation. Additionally, the minimum data requirement set for our developed model was found to be the feature combination of load capacity, road segment type, and driving time, through analyzing the relationship between model prediction accuracy and feature inputs with different dimensions. Based on the suggested model inputs, the recall rate and F1 score of the prediction model are 86.84% and 84.62%, respectively. The main influencing factors analyzed by SHAP values and Principal Component Analysis (PCA) coefficients indicated that the trucker's familiarity with the road and the type of road segment significantly impact the probability of accident occurrence.
Conclusions: This research innovatively establishes a macro data-driven truck accident prediction model alleviating the difficulty of data collection as well as guaranteeing the prediction accuracy.
期刊介绍:
The purpose of Traffic Injury Prevention is to bridge the disciplines of medicine, engineering, public health and traffic safety in order to foster the science of traffic injury prevention. The archival journal focuses on research, interventions and evaluations within the areas of traffic safety, crash causation, injury prevention and treatment.
General topics within the journal''s scope are driver behavior, road infrastructure, emerging crash avoidance technologies, crash and injury epidemiology, alcohol and drugs, impact injury biomechanics, vehicle crashworthiness, occupant restraints, pedestrian safety, evaluation of interventions, economic consequences and emergency and clinical care with specific application to traffic injury prevention. The journal includes full length papers, review articles, case studies, brief technical notes and commentaries.