Autonomous vehicle crash risk modeling by integrating data augmentation and two-layer stacking

IF 9.1 1区 计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Leipeng Zhu , Zhiqing Zhang , Yongnan Zhang , Jingyang Yu , Hongjia Wang
{"title":"Autonomous vehicle crash risk modeling by integrating data augmentation and two-layer stacking","authors":"Leipeng Zhu ,&nbsp;Zhiqing Zhang ,&nbsp;Yongnan Zhang ,&nbsp;Jingyang Yu ,&nbsp;Hongjia Wang","doi":"10.1016/j.compind.2025.104320","DOIUrl":null,"url":null,"abstract":"<div><div>Autonomous vehicle (AV) technology aims to eliminate traffic crashes caused by driver errors, but its adoption has introduced new types of crashes. Due to the high dimensionality and limited sample size of AV crash data, identifying underlying risk factors remains challenging, and crash predictive performance is often suboptimal. To address these issues, this study develops an interpretable data augmentation strategy and the optimized two-layer stacking algorithm, further integrating them into a unified framework that accurately identifies key crash contributing factors and significantly improves predictive performance. The findings reveal that: 1) AV crashes show significant variation in their temporal distributions but follow consistent spatial agglomeration patterns. 2) AV reliability significantly decreases in high-interaction scenarios, with peak travel times and uncertain road conditions identified as key contributing factors. 3) The data augmentation algorithm enhances on key contributing factors and the feature crosses, enhances the model’s ability to capture nonlinear relationships in crash data and improves predictive accuracy in small-sample scenarios, particularly for injury-related crashes. 4) The optimized two-layer stacking algorithm integrates the heterogeneous learning capabilities of models such as LightGBM and Random Forest, significantly improving the ability to recognize complex crash patterns. When combined with data augmentation, the framework achieves strong predictive performance, with both precision and recall reaching 0.92 and the area under the receiver operating characteristic curve at 0.96. Compared to existing machine learning approaches, this framework shows notable advantages in handling high-dimensional small-sample AV crash data. The framework provides an effective solution for AV crash risk modeling and safety design, contributing to the development and implementation of safer intelligent transportation systems.</div></div>","PeriodicalId":55219,"journal":{"name":"Computers in Industry","volume":"171 ","pages":"Article 104320"},"PeriodicalIF":9.1000,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Industry","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166361525000855","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Autonomous vehicle (AV) technology aims to eliminate traffic crashes caused by driver errors, but its adoption has introduced new types of crashes. Due to the high dimensionality and limited sample size of AV crash data, identifying underlying risk factors remains challenging, and crash predictive performance is often suboptimal. To address these issues, this study develops an interpretable data augmentation strategy and the optimized two-layer stacking algorithm, further integrating them into a unified framework that accurately identifies key crash contributing factors and significantly improves predictive performance. The findings reveal that: 1) AV crashes show significant variation in their temporal distributions but follow consistent spatial agglomeration patterns. 2) AV reliability significantly decreases in high-interaction scenarios, with peak travel times and uncertain road conditions identified as key contributing factors. 3) The data augmentation algorithm enhances on key contributing factors and the feature crosses, enhances the model’s ability to capture nonlinear relationships in crash data and improves predictive accuracy in small-sample scenarios, particularly for injury-related crashes. 4) The optimized two-layer stacking algorithm integrates the heterogeneous learning capabilities of models such as LightGBM and Random Forest, significantly improving the ability to recognize complex crash patterns. When combined with data augmentation, the framework achieves strong predictive performance, with both precision and recall reaching 0.92 and the area under the receiver operating characteristic curve at 0.96. Compared to existing machine learning approaches, this framework shows notable advantages in handling high-dimensional small-sample AV crash data. The framework provides an effective solution for AV crash risk modeling and safety design, contributing to the development and implementation of safer intelligent transportation systems.
集成数据增强和两层叠加的自动驾驶汽车碰撞风险建模
自动驾驶汽车(AV)技术的目的是消除驾驶员失误造成的交通事故,但它的采用引入了新的交通事故类型。由于自动驾驶汽车碰撞数据的高维度和有限的样本量,识别潜在的风险因素仍然具有挑战性,并且碰撞预测性能通常不是最优的。为了解决这些问题,本研究开发了一种可解释的数据增强策略和优化的两层叠加算法,并将它们进一步整合到一个统一的框架中,从而准确识别关键的崩溃因素,显著提高预测性能。结果表明:1)AV崩溃在时间分布上存在显著差异,但在空间集聚上具有一致性;2)在高交互场景下,自动驾驶汽车的可靠性显著下降,高峰出行时间和不确定路况是主要影响因素。3)数据增强算法对关键影响因素和特征交叉进行了增强,增强了模型捕捉碰撞数据非线性关系的能力,提高了小样本场景下,特别是伤害相关碰撞的预测精度。4)优化后的两层叠加算法融合了LightGBM、Random Forest等模型的异构学习能力,显著提高了对复杂碰撞模式的识别能力。结合数据增强,该框架具有较强的预测性能,准确率和召回率均达到0.92,接收者工作特征曲线下面积达到0.96。与现有的机器学习方法相比,该框架在处理高维小样本自动驾驶汽车碰撞数据方面显示出显著的优势。该框架为自动驾驶汽车碰撞风险建模和安全设计提供了有效的解决方案,有助于开发和实施更安全的智能交通系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers in Industry
Computers in Industry 工程技术-计算机:跨学科应用
CiteScore
18.90
自引率
8.00%
发文量
152
审稿时长
22 days
期刊介绍: The objective of Computers in Industry is to present original, high-quality, application-oriented research papers that: • Illuminate emerging trends and possibilities in the utilization of Information and Communication Technology in industry; • Establish connections or integrations across various technology domains within the expansive realm of computer applications for industry; • Foster connections or integrations across diverse application areas of ICT in industry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信