Analyzing crash injury severities with deep learning and advanced statistical models: An assessment of methodological challenges

IF 12.6 1区工程技术 Q1 PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH

Analytic Methods in Accident Research Pub Date : 2025-09-08 DOI:10.1016/j.amar.2025.100405

MohammadAli Seyfi , Amir Mohammad Karimi Mamaghan , Ali Behnood , Fred Mannering

{"title":"Analyzing crash injury severities with deep learning and advanced statistical models: An assessment of methodological challenges","authors":"MohammadAli Seyfi , Amir Mohammad Karimi Mamaghan , Ali Behnood , Fred Mannering","doi":"10.1016/j.amar.2025.100405","DOIUrl":null,"url":null,"abstract":"<div><div>In this research, statistical and deep learning models are applied to determine factors that affect motorcycle crash-injury severities. Four methodological challenges are considered: 1) imbalanced data (because fatal injuries are an exceedingly small portion of all resulting injury outcomes); 2) unobserved heterogeneity (because many unobserved factors will influence resulting injury severities); 3) quantification of variable effects; and 4) the possibility of temporally shifting relationships among variables. Convolutional neural networks and deep neural networks are the deep learning models considered, and random parameters logit models with heterogeneity in means and variances is the statistical model considered. Extensive experimentation indicated that data imbalance and unobserved heterogeneity could be best handled in deep learning models with a Bayesian deep neural network with a random generator and weighted loss function. With statistical modeling indicating significant shifts in model parameters over time, the data were segmented by year and both statistical and deep learning models were estimated. While techniques are available for deep learning to potentially handle data imbalance and unobserved heterogeneity, the quantification of variable effects and temporal shifts remains a challenge. For example, a comparison of variable effects show that the deep learning estimates of variable effects are generally inconsistent with the plausible values generated by the statistical models in terms of magnitudes and occasionally in terms of direction, indicating a need for improvements in deep-learning variable-effect extraction methods. The findings also show the need for future work to isolate the effect of complex temporal relationships which are currently imbedded in deep learning approaches, because the segmentation of data that has been used in statistical models to isolate temporal effects, and even the use of all data and defining new time-dependent variables, may not be a viable deep learning option due to the potential loss in predictive performance.</div></div>","PeriodicalId":47520,"journal":{"name":"Analytic Methods in Accident Research","volume":"48 ","pages":"Article 100405"},"PeriodicalIF":12.6000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytic Methods in Accident Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213665725000363","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}

引用次数: 0

Abstract

In this research, statistical and deep learning models are applied to determine factors that affect motorcycle crash-injury severities. Four methodological challenges are considered: 1) imbalanced data (because fatal injuries are an exceedingly small portion of all resulting injury outcomes); 2) unobserved heterogeneity (because many unobserved factors will influence resulting injury severities); 3) quantification of variable effects; and 4) the possibility of temporally shifting relationships among variables. Convolutional neural networks and deep neural networks are the deep learning models considered, and random parameters logit models with heterogeneity in means and variances is the statistical model considered. Extensive experimentation indicated that data imbalance and unobserved heterogeneity could be best handled in deep learning models with a Bayesian deep neural network with a random generator and weighted loss function. With statistical modeling indicating significant shifts in model parameters over time, the data were segmented by year and both statistical and deep learning models were estimated. While techniques are available for deep learning to potentially handle data imbalance and unobserved heterogeneity, the quantification of variable effects and temporal shifts remains a challenge. For example, a comparison of variable effects show that the deep learning estimates of variable effects are generally inconsistent with the plausible values generated by the statistical models in terms of magnitudes and occasionally in terms of direction, indicating a need for improvements in deep-learning variable-effect extraction methods. The findings also show the need for future work to isolate the effect of complex temporal relationships which are currently imbedded in deep learning approaches, because the segmentation of data that has been used in statistical models to isolate temporal effects, and even the use of all data and defining new time-dependent variables, may not be a viable deep learning option due to the potential loss in predictive performance.

查看原文本刊更多论文

用深度学习和高级统计模型分析碰撞损伤严重程度：方法挑战的评估

在本研究中，应用统计和深度学习模型来确定影响摩托车碰撞伤害严重程度的因素。研究考虑了四个方法学上的挑战：1)数据不平衡（因为致命伤害在所有导致的伤害结果中所占比例极小）；2)未观察到的异质性（因为许多未观察到的因素会影响导致的损伤严重程度）；3)变量效应的量化；(4)变量间关系发生时间转移的可能性。考虑的深度学习模型是卷积神经网络和深度神经网络，考虑的统计模型是均值和方差异质性的随机参数logit模型。大量的实验表明，使用随机生成器和加权损失函数的贝叶斯深度神经网络可以最好地处理数据不平衡和未观察到的异质性。统计建模表明模型参数随时间的显著变化，数据按年分割，并对统计和深度学习模型进行估计。虽然深度学习技术可以潜在地处理数据不平衡和未观察到的异质性，但变量效应和时间变化的量化仍然是一个挑战。例如，对变量效应的比较表明，深度学习对变量效应的估计通常与统计模型产生的合理值在量级上不一致，有时在方向上也不一致，这表明深度学习变量效应提取方法需要改进。研究结果还表明，未来的工作需要隔离目前嵌入深度学习方法中的复杂时间关系的影响，因为统计模型中使用的数据分割来隔离时间效应，甚至使用所有数据和定义新的时间相关变量，由于预测性能的潜在损失，可能不是一个可行的深度学习选择。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Analytic Methods in Accident Research Multiple-

CiteScore

22.10

自引率

34.10%

发文量

审稿时长

24 days

期刊介绍： Analytic Methods in Accident Research is a journal that publishes articles related to the development and application of advanced statistical and econometric methods in studying vehicle crashes and other accidents. The journal aims to demonstrate how these innovative approaches can provide new insights into the factors influencing the occurrence and severity of accidents, thereby offering guidance for implementing appropriate preventive measures. While the journal primarily focuses on the analytic approach, it also accepts articles covering various aspects of transportation safety (such as road, pedestrian, air, rail, and water safety), construction safety, and other areas where human behavior, machine failures, or system failures lead to property damage or bodily harm.