Xiaowei Gao , Xinke Jiang , Dingyi Zhuang , James Haworth , Shenhao Wang , Ilya Ilyankou , Huanfa Chen
{"title":"对不完整碰撞数据进行可靠估算,以预测驾驶员受伤严重程度","authors":"Xiaowei Gao , Xinke Jiang , Dingyi Zhuang , James Haworth , Shenhao Wang , Ilya Ilyankou , Huanfa Chen","doi":"10.1016/j.aap.2025.108020","DOIUrl":null,"url":null,"abstract":"<div><div>Traffic crash analyses are frequently challenged by incomplete documentation, particularly in standardised multi-party crash full records. Traditional imputation methods like MICE and KNN, while effective for single-category analyses, fail to address the complex interdependencies inherent in standardised crash records where different types of road user are present. This study introduces a novel graph-based imputation framework that integrates an Inexact Match Bipartite-Graph with Contrastive Learning in a Transformer-GNN architecture, providing a unified solution to handle missing data of various crash types in a complete crash record database. Testing on UK traffic crash records (2018–2022) demonstrates the robust performance of the imputation model, achieving imputation accuracy between 99.24% and 94.74% across missing data rates from 10% to 70%. In the downstream task of classifying the severity of the injury, our imputed data set proved to be highly reliable, achieving a Gmean score of 62.19% to identify levels of imbalanced severity, even under severe missing with a missing rate of 70%. Furthermore, explainable SHAP values demonstrated that data imputation preserved the most important contributing factors. These results validate our framework’s effectiveness in maintaining both data integrity and essential relationship structures in standardised crash records, advancing the field of traffic safety analysis through improved imputation methodology.</div></div>","PeriodicalId":6926,"journal":{"name":"Accident; analysis and prevention","volume":"216 ","pages":"Article 108020"},"PeriodicalIF":5.7000,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reliable imputation of incomplete crash data for predicting driver injury severity\",\"authors\":\"Xiaowei Gao , Xinke Jiang , Dingyi Zhuang , James Haworth , Shenhao Wang , Ilya Ilyankou , Huanfa Chen\",\"doi\":\"10.1016/j.aap.2025.108020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Traffic crash analyses are frequently challenged by incomplete documentation, particularly in standardised multi-party crash full records. Traditional imputation methods like MICE and KNN, while effective for single-category analyses, fail to address the complex interdependencies inherent in standardised crash records where different types of road user are present. This study introduces a novel graph-based imputation framework that integrates an Inexact Match Bipartite-Graph with Contrastive Learning in a Transformer-GNN architecture, providing a unified solution to handle missing data of various crash types in a complete crash record database. Testing on UK traffic crash records (2018–2022) demonstrates the robust performance of the imputation model, achieving imputation accuracy between 99.24% and 94.74% across missing data rates from 10% to 70%. In the downstream task of classifying the severity of the injury, our imputed data set proved to be highly reliable, achieving a Gmean score of 62.19% to identify levels of imbalanced severity, even under severe missing with a missing rate of 70%. Furthermore, explainable SHAP values demonstrated that data imputation preserved the most important contributing factors. These results validate our framework’s effectiveness in maintaining both data integrity and essential relationship structures in standardised crash records, advancing the field of traffic safety analysis through improved imputation methodology.</div></div>\",\"PeriodicalId\":6926,\"journal\":{\"name\":\"Accident; analysis and prevention\",\"volume\":\"216 \",\"pages\":\"Article 108020\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-04-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accident; analysis and prevention\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S000145752500106X\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ERGONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accident; analysis and prevention","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S000145752500106X","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ERGONOMICS","Score":null,"Total":0}
Reliable imputation of incomplete crash data for predicting driver injury severity
Traffic crash analyses are frequently challenged by incomplete documentation, particularly in standardised multi-party crash full records. Traditional imputation methods like MICE and KNN, while effective for single-category analyses, fail to address the complex interdependencies inherent in standardised crash records where different types of road user are present. This study introduces a novel graph-based imputation framework that integrates an Inexact Match Bipartite-Graph with Contrastive Learning in a Transformer-GNN architecture, providing a unified solution to handle missing data of various crash types in a complete crash record database. Testing on UK traffic crash records (2018–2022) demonstrates the robust performance of the imputation model, achieving imputation accuracy between 99.24% and 94.74% across missing data rates from 10% to 70%. In the downstream task of classifying the severity of the injury, our imputed data set proved to be highly reliable, achieving a Gmean score of 62.19% to identify levels of imbalanced severity, even under severe missing with a missing rate of 70%. Furthermore, explainable SHAP values demonstrated that data imputation preserved the most important contributing factors. These results validate our framework’s effectiveness in maintaining both data integrity and essential relationship structures in standardised crash records, advancing the field of traffic safety analysis through improved imputation methodology.
期刊介绍:
Accident Analysis & Prevention provides wide coverage of the general areas relating to accidental injury and damage, including the pre-injury and immediate post-injury phases. Published papers deal with medical, legal, economic, educational, behavioral, theoretical or empirical aspects of transportation accidents, as well as with accidents at other sites. Selected topics within the scope of the Journal may include: studies of human, environmental and vehicular factors influencing the occurrence, type and severity of accidents and injury; the design, implementation and evaluation of countermeasures; biomechanics of impact and human tolerance limits to injury; modelling and statistical analysis of accident data; policy, planning and decision-making in safety.