MohammadAli Seyfi , Amir Mohammad Karimi Mamaghan , Ali Behnood , Fred Mannering
{"title":"Analyzing crash injury severities with deep learning and advanced statistical models: An assessment of methodological challenges","authors":"MohammadAli Seyfi , Amir Mohammad Karimi Mamaghan , Ali Behnood , Fred Mannering","doi":"10.1016/j.amar.2025.100405","DOIUrl":null,"url":null,"abstract":"<div><div>In this research, statistical and deep learning models are applied to determine factors that affect motorcycle crash-injury severities. Four methodological challenges are considered: 1) imbalanced data (because fatal injuries are an exceedingly small portion of all resulting injury outcomes); 2) unobserved heterogeneity (because many unobserved factors will influence resulting injury severities); 3) quantification of variable effects; and 4) the possibility of temporally shifting relationships among variables. Convolutional neural networks and deep neural networks are the deep learning models considered, and random parameters logit models with heterogeneity in means and variances is the statistical model considered. Extensive experimentation indicated that data imbalance and unobserved heterogeneity could be best handled in deep learning models with a Bayesian deep neural network with a random generator and weighted loss function. With statistical modeling indicating significant shifts in model parameters over time, the data were segmented by year and both statistical and deep learning models were estimated. While techniques are available for deep learning to potentially handle data imbalance and unobserved heterogeneity, the quantification of variable effects and temporal shifts remains a challenge. For example, a comparison of variable effects show that the deep learning estimates of variable effects are generally inconsistent with the plausible values generated by the statistical models in terms of magnitudes and occasionally in terms of direction, indicating a need for improvements in deep-learning variable-effect extraction methods. The findings also show the need for future work to isolate the effect of complex temporal relationships which are currently imbedded in deep learning approaches, because the segmentation of data that has been used in statistical models to isolate temporal effects, and even the use of all data and defining new time-dependent variables, may not be a viable deep learning option due to the potential loss in predictive performance.</div></div>","PeriodicalId":47520,"journal":{"name":"Analytic Methods in Accident Research","volume":"48 ","pages":"Article 100405"},"PeriodicalIF":12.6000,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytic Methods in Accident Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2213665725000363","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
In this research, statistical and deep learning models are applied to determine factors that affect motorcycle crash-injury severities. Four methodological challenges are considered: 1) imbalanced data (because fatal injuries are an exceedingly small portion of all resulting injury outcomes); 2) unobserved heterogeneity (because many unobserved factors will influence resulting injury severities); 3) quantification of variable effects; and 4) the possibility of temporally shifting relationships among variables. Convolutional neural networks and deep neural networks are the deep learning models considered, and random parameters logit models with heterogeneity in means and variances is the statistical model considered. Extensive experimentation indicated that data imbalance and unobserved heterogeneity could be best handled in deep learning models with a Bayesian deep neural network with a random generator and weighted loss function. With statistical modeling indicating significant shifts in model parameters over time, the data were segmented by year and both statistical and deep learning models were estimated. While techniques are available for deep learning to potentially handle data imbalance and unobserved heterogeneity, the quantification of variable effects and temporal shifts remains a challenge. For example, a comparison of variable effects show that the deep learning estimates of variable effects are generally inconsistent with the plausible values generated by the statistical models in terms of magnitudes and occasionally in terms of direction, indicating a need for improvements in deep-learning variable-effect extraction methods. The findings also show the need for future work to isolate the effect of complex temporal relationships which are currently imbedded in deep learning approaches, because the segmentation of data that has been used in statistical models to isolate temporal effects, and even the use of all data and defining new time-dependent variables, may not be a viable deep learning option due to the potential loss in predictive performance.
期刊介绍:
Analytic Methods in Accident Research is a journal that publishes articles related to the development and application of advanced statistical and econometric methods in studying vehicle crashes and other accidents. The journal aims to demonstrate how these innovative approaches can provide new insights into the factors influencing the occurrence and severity of accidents, thereby offering guidance for implementing appropriate preventive measures. While the journal primarily focuses on the analytic approach, it also accepts articles covering various aspects of transportation safety (such as road, pedestrian, air, rail, and water safety), construction safety, and other areas where human behavior, machine failures, or system failures lead to property damage or bodily harm.