Sentence-resampled BERT-CRF model for autonomous vehicle crash causality analysis from large-scale accident narrative text data.

IF 6.2 1区 工程技术 Q1 ERGONOMICS
Accident; analysis and prevention Pub Date : 2025-10-01 Epub Date: 2025-08-07 DOI:10.1016/j.aap.2025.108184
Ruixu Pan, Quan Yuan, Jiaming Cao, Chonghao Zhang, Chengcheng Yu, Qian Liu, Chao Yang, Xingyu Liang
{"title":"Sentence-resampled BERT-CRF model for autonomous vehicle crash causality analysis from large-scale accident narrative text data.","authors":"Ruixu Pan, Quan Yuan, Jiaming Cao, Chonghao Zhang, Chengcheng Yu, Qian Liu, Chao Yang, Xingyu Liang","doi":"10.1016/j.aap.2025.108184","DOIUrl":null,"url":null,"abstract":"<p><p>As autonomous vehicles (AVs) have been increasingly used, exploring crash causality mechanisms is critical to improving traffic safety related to AVs use. However, existing studies have primarily employed structured data to analyze such causality, while limited efforts have been made to identify causality from unstructured crash narratives, which are featured by data imbalance and small sample sizes. Original crash narratives contain a wealth of latent information about AV crashes that can further the understanding of AV safety. This study proposes a Sentence-resampled BERT-CRF model combined with a DREAM-inspired hierarchical causal attribution framework to systematically analyze the causality mechanisms of AV crashes based on original crash narratives. First, an annotation scheme combining \"BIO\" and \"C-P-R-D\" tags is designed to capture temporal causal relationships in crash narratives and extract causal movement chain (CMC) by the BERT-CRF model. Meanwhile, the data imbalance problem is mitigated by using the sentence-level resampling method, and the results show that the model is 98.03% accurate on the complete dataset, and maintains 96.14% accuracy with a small sample of 10%. Then, a two-tier causal attribution framework(5 categories and 52 elements) inspired by DREAM theory is developed to identify 16 categories of typical scenarios, with rear-end(48.57%) and lane-change (17.04%) collisions as high-risk scenarios. In-depth analysis shows that rear-end crashes are mostly caused by the coupling of a conventional vehicle (CV) following too close to the AV(B5) and the AV's insufficient decisive decision to slow down (A2), while lane-change crashes are associated with the CV's hazardous lane-change (B2) and the delay of AV's intent recognition. The proposed framework bridges the gap between unstructured narratives data and structured causal inference, revealing human-computer interaction deficiencies, environment perception limitations, and roadway facility impacts as the core causal factors. These findings provide data-driven theoretical support for AV manufacturers to optimize sensing algorithms and traffic authorities to develop corresponding regulations.</p>","PeriodicalId":6926,"journal":{"name":"Accident; analysis and prevention","volume":"221 ","pages":"108184"},"PeriodicalIF":6.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accident; analysis and prevention","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.aap.2025.108184","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ERGONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

As autonomous vehicles (AVs) have been increasingly used, exploring crash causality mechanisms is critical to improving traffic safety related to AVs use. However, existing studies have primarily employed structured data to analyze such causality, while limited efforts have been made to identify causality from unstructured crash narratives, which are featured by data imbalance and small sample sizes. Original crash narratives contain a wealth of latent information about AV crashes that can further the understanding of AV safety. This study proposes a Sentence-resampled BERT-CRF model combined with a DREAM-inspired hierarchical causal attribution framework to systematically analyze the causality mechanisms of AV crashes based on original crash narratives. First, an annotation scheme combining "BIO" and "C-P-R-D" tags is designed to capture temporal causal relationships in crash narratives and extract causal movement chain (CMC) by the BERT-CRF model. Meanwhile, the data imbalance problem is mitigated by using the sentence-level resampling method, and the results show that the model is 98.03% accurate on the complete dataset, and maintains 96.14% accuracy with a small sample of 10%. Then, a two-tier causal attribution framework(5 categories and 52 elements) inspired by DREAM theory is developed to identify 16 categories of typical scenarios, with rear-end(48.57%) and lane-change (17.04%) collisions as high-risk scenarios. In-depth analysis shows that rear-end crashes are mostly caused by the coupling of a conventional vehicle (CV) following too close to the AV(B5) and the AV's insufficient decisive decision to slow down (A2), while lane-change crashes are associated with the CV's hazardous lane-change (B2) and the delay of AV's intent recognition. The proposed framework bridges the gap between unstructured narratives data and structured causal inference, revealing human-computer interaction deficiencies, environment perception limitations, and roadway facility impacts as the core causal factors. These findings provide data-driven theoretical support for AV manufacturers to optimize sensing algorithms and traffic authorities to develop corresponding regulations.

基于大规模事故叙事文本数据的自动驾驶汽车碰撞因果分析BERT-CRF模型。
随着自动驾驶汽车(AVs)的使用越来越多,探索碰撞因果机制对于提高与自动驾驶汽车使用相关的交通安全至关重要。然而,现有的研究主要使用结构化数据来分析这种因果关系,而在非结构化撞车叙述中识别因果关系的努力有限,其特点是数据不平衡和样本量小。原始的事故叙述包含了大量关于自动驾驶汽车事故的潜在信息,可以进一步了解自动驾驶汽车的安全性。本研究提出了基于句子重采样的BERT-CRF模型,并结合dream启发的分层因果归因框架,在原始事故叙述的基础上系统分析了自动驾驶事故的因果机制。首先,设计了一种结合“BIO”和“C-P-R-D”标签的标注方案,通过BERT-CRF模型捕捉事故叙事中的时间因果关系,提取因果运动链(CMC)。同时,采用句子级重采样方法缓解了数据不平衡问题,结果表明,该模型在完整数据集上的准确率为98.03%,在10%的小样本下仍保持96.14%的准确率。然后,以DREAM理论为灵感,构建了两层因果归因框架(5类52要素),确定了16类典型场景,其中追尾碰撞(48.57%)和变道碰撞(17.04%)为高风险场景。深入分析表明,追尾事故主要是由于常规车辆(CV)过于靠近自动驾驶汽车(B5)和自动驾驶汽车的减速决策不足(A2)耦合造成的,而变道事故主要与CV危险变道(B2)和自动驾驶汽车的意图识别延迟有关。所提出的框架弥合了非结构化叙事数据和结构化因果推理之间的差距,揭示了人机交互缺陷、环境感知限制和道路设施影响作为核心因果因素。这些研究结果为自动驾驶汽车制造商优化感知算法和交通管理部门制定相应法规提供了数据驱动的理论支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
11.90
自引率
16.90%
发文量
264
审稿时长
48 days
期刊介绍: Accident Analysis & Prevention provides wide coverage of the general areas relating to accidental injury and damage, including the pre-injury and immediate post-injury phases. Published papers deal with medical, legal, economic, educational, behavioral, theoretical or empirical aspects of transportation accidents, as well as with accidents at other sites. Selected topics within the scope of the Journal may include: studies of human, environmental and vehicular factors influencing the occurrence, type and severity of accidents and injury; the design, implementation and evaluation of countermeasures; biomechanics of impact and human tolerance limits to injury; modelling and statistical analysis of accident data; policy, planning and decision-making in safety.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信