Developing an XGBoost based model to predict the probability of truck crashes driven by macro operation and insurance data.

IF 1.9 3区 工程技术 Q3 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH
Yiping Wu, Hongpeng Zhang, Peng Song, Xiaoheng Sun, Ji Meng, Jun Ma, Liwei Gao
{"title":"Developing an XGBoost based model to predict the probability of truck crashes driven by macro operation and insurance data.","authors":"Yiping Wu, Hongpeng Zhang, Peng Song, Xiaoheng Sun, Ji Meng, Jun Ma, Liwei Gao","doi":"10.1080/15389588.2025.2545002","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Truck accidents caused significant casualties usually. Establishing a scientific truck accident prediction model and identifying the primary causes are crucial for proactive accident prevention.</p><p><strong>Methods: </strong>The proposed model was developed using annual operational behavior data and corresponding insurance claim information from commercial trucks. Prior to model training, multicollinearity among predictor variables was addressed to ensure model interpretability and stability. Model performance was evaluated using recall, F1 score, and overall prediction accuracy, including external validation with a temporally separated dataset from the same driver population. To reduce input data dependency, an input dimensionality reduction analysis was conducted to determine the minimal data requirements. SHAP (Shapley Additive Explanations) values and principal component coefficients were employed to extract the main factors influencing truck accidents.</p><p><strong>Results: </strong>The truck accident prediction model with a recall rate of 84.21% and an F1 score of 85.33%. The prediction accuracy of our developed model reached 87.59% when using new data from the same group of truckers in the subsequent year for validation. Additionally, the minimum data requirement set for our developed model was found to be the feature combination of load capacity, road segment type, and driving time, through analyzing the relationship between model prediction accuracy and feature inputs with different dimensions. Based on the suggested model inputs, the recall rate and F1 score of the prediction model are 86.84% and 84.62%, respectively. The main influencing factors analyzed by SHAP values and Principal Component Analysis (PCA) coefficients indicated that the trucker's familiarity with the road and the type of road segment significantly impact the probability of accident occurrence.</p><p><strong>Conclusions: </strong>This research innovatively establishes a macro data-driven truck accident prediction model alleviating the difficulty of data collection as well as guaranteeing the prediction accuracy.</p>","PeriodicalId":54422,"journal":{"name":"Traffic Injury Prevention","volume":" ","pages":"1-10"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Traffic Injury Prevention","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/15389588.2025.2545002","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Truck accidents caused significant casualties usually. Establishing a scientific truck accident prediction model and identifying the primary causes are crucial for proactive accident prevention.

Methods: The proposed model was developed using annual operational behavior data and corresponding insurance claim information from commercial trucks. Prior to model training, multicollinearity among predictor variables was addressed to ensure model interpretability and stability. Model performance was evaluated using recall, F1 score, and overall prediction accuracy, including external validation with a temporally separated dataset from the same driver population. To reduce input data dependency, an input dimensionality reduction analysis was conducted to determine the minimal data requirements. SHAP (Shapley Additive Explanations) values and principal component coefficients were employed to extract the main factors influencing truck accidents.

Results: The truck accident prediction model with a recall rate of 84.21% and an F1 score of 85.33%. The prediction accuracy of our developed model reached 87.59% when using new data from the same group of truckers in the subsequent year for validation. Additionally, the minimum data requirement set for our developed model was found to be the feature combination of load capacity, road segment type, and driving time, through analyzing the relationship between model prediction accuracy and feature inputs with different dimensions. Based on the suggested model inputs, the recall rate and F1 score of the prediction model are 86.84% and 84.62%, respectively. The main influencing factors analyzed by SHAP values and Principal Component Analysis (PCA) coefficients indicated that the trucker's familiarity with the road and the type of road segment significantly impact the probability of accident occurrence.

Conclusions: This research innovatively establishes a macro data-driven truck accident prediction model alleviating the difficulty of data collection as well as guaranteeing the prediction accuracy.

开发基于XGBoost的模型,在宏观运行和保险数据驱动下预测卡车碰撞概率。
目的:卡车交通事故通常造成重大人员伤亡。建立科学的货车事故预测模型,确定事故发生的主要原因,是主动预防事故的关键。方法:利用商业卡车的年度运营行为数据和相应的保险理赔信息建立模型。在模型训练之前,我们处理了预测变量之间的多重共线性,以确保模型的可解释性和稳定性。模型性能通过召回率、F1分数和整体预测精度进行评估,包括使用来自同一驾驶员群体的暂时分离的数据集进行外部验证。为了减少对输入数据的依赖,进行了输入维数降维分析,以确定最小数据需求。采用Shapley加性解释(Shapley Additive explanation)值和主成分系数提取影响卡车事故的主要因素。结果:货车事故预测模型召回率为84.21%,F1得分为85.33%。当使用次年同一组卡车司机的新数据进行验证时,我们所开发的模型的预测精度达到87.59%。此外,通过分析模型预测精度与不同维度特征输入之间的关系,发现模型的最小数据需求集是承载能力、路段类型和驾驶时间的特征组合。基于建议的模型输入,预测模型的召回率和F1得分分别为86.84%和84.62%。利用SHAP值和主成分分析(PCA)系数对主要影响因素进行分析,结果表明货车驾驶人对道路的熟悉程度和路段类型对事故发生概率有显著影响。结论:本研究创新性地建立了宏观数据驱动的货车事故预测模型,减轻了数据采集的难度,保证了预测的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Traffic Injury Prevention
Traffic Injury Prevention PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH-
CiteScore
3.60
自引率
10.00%
发文量
137
审稿时长
3 months
期刊介绍: The purpose of Traffic Injury Prevention is to bridge the disciplines of medicine, engineering, public health and traffic safety in order to foster the science of traffic injury prevention. The archival journal focuses on research, interventions and evaluations within the areas of traffic safety, crash causation, injury prevention and treatment. General topics within the journal''s scope are driver behavior, road infrastructure, emerging crash avoidance technologies, crash and injury epidemiology, alcohol and drugs, impact injury biomechanics, vehicle crashworthiness, occupant restraints, pedestrian safety, evaluation of interventions, economic consequences and emergency and clinical care with specific application to traffic injury prevention. The journal includes full length papers, review articles, case studies, brief technical notes and commentaries.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信