{"title":"Pre-crash injury risk prediction with guaranteed confidence level: a conformal and interpretable framework.","authors":"Junhao Wei, Yusuke Miyazaki, Fusako Sato","doi":"10.1080/15389588.2025.2538725","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Pre-crash injury risk prediction is crucial for proactive safety measures, while traditional models, which output single-point predictions without explaining the decision reasons, often lack interpretability and reliable uncertainty estimation to reflect potential risk distributions. These drawbacks limit their practical effectiveness in mitigating injury severity. To overcome these limitations, this study develops a novel framework that outputs potential risk distributions and their corresponding probabilities using only pre-crash data, thereby delivering probabilistic outputs with a statistically guaranteed 90% confidence level. By introducing such a framework, we aim to provide a more convincing and interpretable analysis of the injury distribution and its underlying causes in traffic accidents, ultimately offering data-driven guidance for injury mitigation strategies.</p><p><strong>Methods: </strong>Data from the National Automotive Sampling System-Crashworthiness Data System and the Crash Investigation Sampling System were used, incorporating 28 pre-crash risk factors. Several machine learning models, including ensemble methods and the deep learning model TabNet, were evaluated. To address the significant class imbalance, particularly the limited number of serious injury cases, various resampling strategies were applied. The core contribution lies in integrating conformal prediction methods, both naive and class-conditional, to generate prediction sets at a 90% confidence level. Model performance was assessed <i>via</i> global evaluation metrics (i.e., f1-score) and serious injury recall, and interpretability was enhanced using explainable machine learning and statistical analysis.</p><p><strong>Results: </strong>Comparative experiments indicate a nearly 90% prediction coverage and a 70.3% recall rate for serious injuries by proposed framework, which is significantly higher than those reported in related studies. Further model interpretation highlights key risk factors such as intersection relevance, crash type, and speed limits and how they effect injury severity prediction.</p><p><strong>Conclusions: </strong>Proposed framework demonstrates significant potential in pre-crash injury risk prediction by introducing conformal prediction techniques to machine learning models. In addition to enhancing predictive performance to nearly 90% prediction coverage and a 70.3% recall rate for serious injuries, this framework also provides enhanced interpretability by quantifying prediction uncertainty and identifying key risk factors. Unlike traditional methods, the framework remains valid under distribution shifts and combines uncertainty estimation with model interpretability. These advantages collectively lay a foundation for developing proactive traffic safety applications and formulating data-driven road safety policies.</p>","PeriodicalId":54422,"journal":{"name":"Traffic Injury Prevention","volume":" ","pages":"1-11"},"PeriodicalIF":1.9000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Traffic Injury Prevention","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/15389588.2025.2538725","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Pre-crash injury risk prediction is crucial for proactive safety measures, while traditional models, which output single-point predictions without explaining the decision reasons, often lack interpretability and reliable uncertainty estimation to reflect potential risk distributions. These drawbacks limit their practical effectiveness in mitigating injury severity. To overcome these limitations, this study develops a novel framework that outputs potential risk distributions and their corresponding probabilities using only pre-crash data, thereby delivering probabilistic outputs with a statistically guaranteed 90% confidence level. By introducing such a framework, we aim to provide a more convincing and interpretable analysis of the injury distribution and its underlying causes in traffic accidents, ultimately offering data-driven guidance for injury mitigation strategies.
Methods: Data from the National Automotive Sampling System-Crashworthiness Data System and the Crash Investigation Sampling System were used, incorporating 28 pre-crash risk factors. Several machine learning models, including ensemble methods and the deep learning model TabNet, were evaluated. To address the significant class imbalance, particularly the limited number of serious injury cases, various resampling strategies were applied. The core contribution lies in integrating conformal prediction methods, both naive and class-conditional, to generate prediction sets at a 90% confidence level. Model performance was assessed via global evaluation metrics (i.e., f1-score) and serious injury recall, and interpretability was enhanced using explainable machine learning and statistical analysis.
Results: Comparative experiments indicate a nearly 90% prediction coverage and a 70.3% recall rate for serious injuries by proposed framework, which is significantly higher than those reported in related studies. Further model interpretation highlights key risk factors such as intersection relevance, crash type, and speed limits and how they effect injury severity prediction.
Conclusions: Proposed framework demonstrates significant potential in pre-crash injury risk prediction by introducing conformal prediction techniques to machine learning models. In addition to enhancing predictive performance to nearly 90% prediction coverage and a 70.3% recall rate for serious injuries, this framework also provides enhanced interpretability by quantifying prediction uncertainty and identifying key risk factors. Unlike traditional methods, the framework remains valid under distribution shifts and combines uncertainty estimation with model interpretability. These advantages collectively lay a foundation for developing proactive traffic safety applications and formulating data-driven road safety policies.
期刊介绍:
The purpose of Traffic Injury Prevention is to bridge the disciplines of medicine, engineering, public health and traffic safety in order to foster the science of traffic injury prevention. The archival journal focuses on research, interventions and evaluations within the areas of traffic safety, crash causation, injury prevention and treatment.
General topics within the journal''s scope are driver behavior, road infrastructure, emerging crash avoidance technologies, crash and injury epidemiology, alcohol and drugs, impact injury biomechanics, vehicle crashworthiness, occupant restraints, pedestrian safety, evaluation of interventions, economic consequences and emergency and clinical care with specific application to traffic injury prevention. The journal includes full length papers, review articles, case studies, brief technical notes and commentaries.