Automating Pedestrian Crash Typology Using Transformer Models

Amir Hossein Oliaee, Subasish Das, Minh Le
{"title":"Automating Pedestrian Crash Typology Using Transformer Models","authors":"Amir Hossein Oliaee, Subasish Das, Minh Le","doi":"10.1177/03611981241260691","DOIUrl":null,"url":null,"abstract":"To accurately analyze and understand the causes of traffic crashes involving pedestrians and bicyclists, the Pedestrian and Bicycle Crash Analysis Tool (PBCAT) was developed. However, manual data entry in the tool is labor intensive. Thus, a more automated method is needed for large data sets. This study developed deep-learning models to automate the classification of crash types. Additionally, the PBCAT’s classification typology can lead to imbalanced data sets, underscoring the need to actively tackle the issue of imbalanced native classification. By addressing this issue, researchers can significantly enhance their ability to harness the potential of emerging large language models. This endeavor becomes even more crucial as large language models like transformer models become increasingly accessible, offering promising opportunities in transportation safety research. This study focused on police reports’ text narratives concerning pedestrian crashes in three major cities in Texas from 2018 to 2020 as a case study. It evaluated the effectiveness of classification loss functions, classification typology adjustments, and model pre-training in addressing the adverse effects of data set imbalance. Our tests indicate that better classification results can be achieved by using the balanced categorical cross entropy (BCE) loss function and using a model with a more robust pre-training. This effect was noticeable when a large enough sample size was present for each class. In the case of smaller data sets, a tiered classification system was recommended, with fewer classes and more distinct text sentiment.","PeriodicalId":517391,"journal":{"name":"Transportation Research Record: Journal of the Transportation Research Board","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Record: Journal of the Transportation Research Board","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/03611981241260691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

To accurately analyze and understand the causes of traffic crashes involving pedestrians and bicyclists, the Pedestrian and Bicycle Crash Analysis Tool (PBCAT) was developed. However, manual data entry in the tool is labor intensive. Thus, a more automated method is needed for large data sets. This study developed deep-learning models to automate the classification of crash types. Additionally, the PBCAT’s classification typology can lead to imbalanced data sets, underscoring the need to actively tackle the issue of imbalanced native classification. By addressing this issue, researchers can significantly enhance their ability to harness the potential of emerging large language models. This endeavor becomes even more crucial as large language models like transformer models become increasingly accessible, offering promising opportunities in transportation safety research. This study focused on police reports’ text narratives concerning pedestrian crashes in three major cities in Texas from 2018 to 2020 as a case study. It evaluated the effectiveness of classification loss functions, classification typology adjustments, and model pre-training in addressing the adverse effects of data set imbalance. Our tests indicate that better classification results can be achieved by using the balanced categorical cross entropy (BCE) loss function and using a model with a more robust pre-training. This effect was noticeable when a large enough sample size was present for each class. In the case of smaller data sets, a tiered classification system was recommended, with fewer classes and more distinct text sentiment.
利用变换器模型实现行人碰撞类型学自动化
为了准确分析和了解涉及行人和骑自行车者的交通事故原因,开发了行人和自行车事故分析工具 (PBCAT)。然而,在该工具中手动输入数据耗费大量人力。因此,需要一种更加自动化的方法来处理大型数据集。本研究开发了深度学习模型来自动分类碰撞类型。此外,PBCAT 的分类类型学可能会导致数据集的不平衡,因此需要积极解决本地分类不平衡的问题。通过解决这一问题,研究人员可以大大提高利用新兴大型语言模型潜力的能力。随着转换器模型等大型语言模型越来越容易获得,这一努力变得更加重要,为交通安全研究提供了大有可为的机会。本研究以 2018 年至 2020 年德克萨斯州三个主要城市的警方报告中有关行人碰撞事故的文本叙述为案例。它评估了分类损失函数、分类类型调整和模型预训练在解决数据集不平衡的不利影响方面的有效性。我们的测试表明,使用平衡分类交叉熵(BCE)损失函数和使用更稳健的预训练模型可以获得更好的分类结果。当每个类别的样本量足够大时,这种效果就会很明显。在数据集较小的情况下,建议采用分层分类系统,减少类别,增加不同的文本情感。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信