Hongliang Ding , Zhuo Liu , Hanlong Fu , Xiaowen Fu , Tiantian Chen , Jinhua Zhao
{"title":"如果缺失的信息得到补充,自动驾驶汽车碰撞数据集是否能提供更多信息?利用生成对抗归算网络解决数据质量问题","authors":"Hongliang Ding , Zhuo Liu , Hanlong Fu , Xiaowen Fu , Tiantian Chen , Jinhua Zhao","doi":"10.1016/j.trc.2025.105154","DOIUrl":null,"url":null,"abstract":"<div><div>The growing prevalence of autonomous vehicles (AVs) offers new opportunities for enhancing traffic efficiency. However, AVs still face significant challenges that impact their safety and effectiveness in preventing accidents. Real-world operational data is therefore essential to identifying the factors contributing to AV crashes. Despite this, the analysis of AV crashes is still hampered by a lack of data, missing information, and underreporting, which negatively impacts its accuracy and comprehensiveness. To address this challenge, a method based on Generative Adversarial Networks (GANs) was used for data imputation, leveraging their advantage in handling heterogeneous data. An evaluation of the performance of our proposed data imputation approach was performed by comparing it with two established methods, namely conventional case deletion and Random Forest (RF) imputation. Synthetic data obtained from these three methods were modelled using the random parameters logit model with heterogeneity in means. Data from the California Department of Motor Vehicles (DMV) and the National Highway Traffic Safety Administration (NHTSA) covering 2021–2023 were used. Our results showed that the model based on Generative Adversarial Imputation Networks (GAIN)- processing data outperformed other candidate methods in terms of fitting, predictive accuracy, and factor interpretation. Our results suggest that factors including speed limit, roadway types, head-on crashes, and takeover of ADAS-equipped vehicles are positively associated with serious injury crashes. On the other hand, ADS engagement and crashes with fixed objects exhibit a negative association with serious injury crashes. Additionally, heterogeneous effects of posted speed limits and ADS engagement on AV crash severity were captured to provide a deeper insight into implications.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"176 ","pages":"Article 105154"},"PeriodicalIF":7.6000,"publicationDate":"2025-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Can AV crash datasets provide more insight if missing information is supplemented? Employing Generative Adversarial Imputation Networks to Tackle Data Quality Issues\",\"authors\":\"Hongliang Ding , Zhuo Liu , Hanlong Fu , Xiaowen Fu , Tiantian Chen , Jinhua Zhao\",\"doi\":\"10.1016/j.trc.2025.105154\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The growing prevalence of autonomous vehicles (AVs) offers new opportunities for enhancing traffic efficiency. However, AVs still face significant challenges that impact their safety and effectiveness in preventing accidents. Real-world operational data is therefore essential to identifying the factors contributing to AV crashes. Despite this, the analysis of AV crashes is still hampered by a lack of data, missing information, and underreporting, which negatively impacts its accuracy and comprehensiveness. To address this challenge, a method based on Generative Adversarial Networks (GANs) was used for data imputation, leveraging their advantage in handling heterogeneous data. An evaluation of the performance of our proposed data imputation approach was performed by comparing it with two established methods, namely conventional case deletion and Random Forest (RF) imputation. Synthetic data obtained from these three methods were modelled using the random parameters logit model with heterogeneity in means. Data from the California Department of Motor Vehicles (DMV) and the National Highway Traffic Safety Administration (NHTSA) covering 2021–2023 were used. Our results showed that the model based on Generative Adversarial Imputation Networks (GAIN)- processing data outperformed other candidate methods in terms of fitting, predictive accuracy, and factor interpretation. Our results suggest that factors including speed limit, roadway types, head-on crashes, and takeover of ADAS-equipped vehicles are positively associated with serious injury crashes. On the other hand, ADS engagement and crashes with fixed objects exhibit a negative association with serious injury crashes. Additionally, heterogeneous effects of posted speed limits and ADS engagement on AV crash severity were captured to provide a deeper insight into implications.</div></div>\",\"PeriodicalId\":54417,\"journal\":{\"name\":\"Transportation Research Part C-Emerging Technologies\",\"volume\":\"176 \",\"pages\":\"Article 105154\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Part C-Emerging Technologies\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0968090X25001585\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TRANSPORTATION SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25001585","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
Can AV crash datasets provide more insight if missing information is supplemented? Employing Generative Adversarial Imputation Networks to Tackle Data Quality Issues
The growing prevalence of autonomous vehicles (AVs) offers new opportunities for enhancing traffic efficiency. However, AVs still face significant challenges that impact their safety and effectiveness in preventing accidents. Real-world operational data is therefore essential to identifying the factors contributing to AV crashes. Despite this, the analysis of AV crashes is still hampered by a lack of data, missing information, and underreporting, which negatively impacts its accuracy and comprehensiveness. To address this challenge, a method based on Generative Adversarial Networks (GANs) was used for data imputation, leveraging their advantage in handling heterogeneous data. An evaluation of the performance of our proposed data imputation approach was performed by comparing it with two established methods, namely conventional case deletion and Random Forest (RF) imputation. Synthetic data obtained from these three methods were modelled using the random parameters logit model with heterogeneity in means. Data from the California Department of Motor Vehicles (DMV) and the National Highway Traffic Safety Administration (NHTSA) covering 2021–2023 were used. Our results showed that the model based on Generative Adversarial Imputation Networks (GAIN)- processing data outperformed other candidate methods in terms of fitting, predictive accuracy, and factor interpretation. Our results suggest that factors including speed limit, roadway types, head-on crashes, and takeover of ADAS-equipped vehicles are positively associated with serious injury crashes. On the other hand, ADS engagement and crashes with fixed objects exhibit a negative association with serious injury crashes. Additionally, heterogeneous effects of posted speed limits and ADS engagement on AV crash severity were captured to provide a deeper insight into implications.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.