Can AV crash datasets provide more insight if missing information is supplemented? Employing Generative Adversarial Imputation Networks to Tackle Data Quality Issues

IF 7.6 1区工程技术 Q1 TRANSPORTATION SCIENCE & TECHNOLOGY

Transportation Research Part C-Emerging Technologies Pub Date : 2025-05-17 DOI:10.1016/j.trc.2025.105154

Hongliang Ding , Zhuo Liu , Hanlong Fu , Xiaowen Fu , Tiantian Chen , Jinhua Zhao

{"title":"Can AV crash datasets provide more insight if missing information is supplemented? Employing Generative Adversarial Imputation Networks to Tackle Data Quality Issues","authors":"Hongliang Ding , Zhuo Liu , Hanlong Fu , Xiaowen Fu , Tiantian Chen , Jinhua Zhao","doi":"10.1016/j.trc.2025.105154","DOIUrl":null,"url":null,"abstract":"<div><div>The growing prevalence of autonomous vehicles (AVs) offers new opportunities for enhancing traffic efficiency. However, AVs still face significant challenges that impact their safety and effectiveness in preventing accidents. Real-world operational data is therefore essential to identifying the factors contributing to AV crashes. Despite this, the analysis of AV crashes is still hampered by a lack of data, missing information, and underreporting, which negatively impacts its accuracy and comprehensiveness. To address this challenge, a method based on Generative Adversarial Networks (GANs) was used for data imputation, leveraging their advantage in handling heterogeneous data. An evaluation of the performance of our proposed data imputation approach was performed by comparing it with two established methods, namely conventional case deletion and Random Forest (RF) imputation. Synthetic data obtained from these three methods were modelled using the random parameters logit model with heterogeneity in means. Data from the California Department of Motor Vehicles (DMV) and the National Highway Traffic Safety Administration (NHTSA) covering 2021–2023 were used. Our results showed that the model based on Generative Adversarial Imputation Networks (GAIN)- processing data outperformed other candidate methods in terms of fitting, predictive accuracy, and factor interpretation. Our results suggest that factors including speed limit, roadway types, head-on crashes, and takeover of ADAS-equipped vehicles are positively associated with serious injury crashes. On the other hand, ADS engagement and crashes with fixed objects exhibit a negative association with serious injury crashes. Additionally, heterogeneous effects of posted speed limits and ADS engagement on AV crash severity were captured to provide a deeper insight into implications.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"176 ","pages":"Article 105154"},"PeriodicalIF":7.6000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25001585","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

The growing prevalence of autonomous vehicles (AVs) offers new opportunities for enhancing traffic efficiency. However, AVs still face significant challenges that impact their safety and effectiveness in preventing accidents. Real-world operational data is therefore essential to identifying the factors contributing to AV crashes. Despite this, the analysis of AV crashes is still hampered by a lack of data, missing information, and underreporting, which negatively impacts its accuracy and comprehensiveness. To address this challenge, a method based on Generative Adversarial Networks (GANs) was used for data imputation, leveraging their advantage in handling heterogeneous data. An evaluation of the performance of our proposed data imputation approach was performed by comparing it with two established methods, namely conventional case deletion and Random Forest (RF) imputation. Synthetic data obtained from these three methods were modelled using the random parameters logit model with heterogeneity in means. Data from the California Department of Motor Vehicles (DMV) and the National Highway Traffic Safety Administration (NHTSA) covering 2021–2023 were used. Our results showed that the model based on Generative Adversarial Imputation Networks (GAIN)- processing data outperformed other candidate methods in terms of fitting, predictive accuracy, and factor interpretation. Our results suggest that factors including speed limit, roadway types, head-on crashes, and takeover of ADAS-equipped vehicles are positively associated with serious injury crashes. On the other hand, ADS engagement and crashes with fixed objects exhibit a negative association with serious injury crashes. Additionally, heterogeneous effects of posted speed limits and ADS engagement on AV crash severity were captured to provide a deeper insight into implications.

查看原文本刊更多论文

如果缺失的信息得到补充，自动驾驶汽车碰撞数据集是否能提供更多信息？利用生成对抗归算网络解决数据质量问题

自动驾驶汽车（av）的日益普及为提高交通效率提供了新的机会。然而，自动驾驶汽车仍然面临着影响其安全性和预防事故有效性的重大挑战。因此，实际操作数据对于识别导致自动驾驶事故的因素至关重要。尽管如此，自动驾驶汽车事故的分析仍然受到数据缺乏、信息缺失和漏报的阻碍，这对其准确性和全面性产生了负面影响。为了解决这一挑战，一种基于生成对抗网络（GANs）的方法被用于数据输入，利用它们在处理异构数据方面的优势。通过将我们提出的数据插入方法与两种既定方法（即传统的案例删除和随机森林（RF）插入）进行比较，对其性能进行了评估。通过这三种方法获得的综合数据采用随机参数logit模型进行建模。该研究使用了加州机动车辆管理局（DMV）和美国国家公路交通安全管理局（NHTSA） 2021-2023年的数据。我们的研究结果表明，基于生成对抗Imputation网络（GAIN）处理数据的模型在拟合、预测精度和因素解释方面优于其他候选方法。我们的研究结果表明，包括限速、道路类型、正面碰撞和配备adas的车辆接管在内的因素与严重伤害碰撞呈正相关。另一方面，ADS参与和与固定物体的碰撞与严重伤害碰撞呈负相关。此外，为了更深入地了解自动驾驶汽车碰撞严重程度，研究人员还捕获了限速和ADS系统对自动驾驶汽车碰撞严重程度的不同影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Transportation Research Part C-Emerging Technologies 工程技术-运输科技

CiteScore

15.80

自引率

12.00%

发文量

332

审稿时长

64 days

期刊介绍： Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.