利用变换器模型实现行人碰撞类型学自动化

Transportation Research Record: Journal of the Transportation Research Board Pub Date : 2024-08-02 DOI:10.1177/03611981241260691

Amir Hossein Oliaee, Subasish Das, Minh Le

{"title":"利用变换器模型实现行人碰撞类型学自动化","authors":"Amir Hossein Oliaee, Subasish Das, Minh Le","doi":"10.1177/03611981241260691","DOIUrl":null,"url":null,"abstract":"To accurately analyze and understand the causes of traffic crashes involving pedestrians and bicyclists, the Pedestrian and Bicycle Crash Analysis Tool (PBCAT) was developed. However, manual data entry in the tool is labor intensive. Thus, a more automated method is needed for large data sets. This study developed deep-learning models to automate the classification of crash types. Additionally, the PBCAT’s classification typology can lead to imbalanced data sets, underscoring the need to actively tackle the issue of imbalanced native classification. By addressing this issue, researchers can significantly enhance their ability to harness the potential of emerging large language models. This endeavor becomes even more crucial as large language models like transformer models become increasingly accessible, offering promising opportunities in transportation safety research. This study focused on police reports’ text narratives concerning pedestrian crashes in three major cities in Texas from 2018 to 2020 as a case study. It evaluated the effectiveness of classification loss functions, classification typology adjustments, and model pre-training in addressing the adverse effects of data set imbalance. Our tests indicate that better classification results can be achieved by using the balanced categorical cross entropy (BCE) loss function and using a model with a more robust pre-training. This effect was noticeable when a large enough sample size was present for each class. In the case of smaller data sets, a tiered classification system was recommended, with fewer classes and more distinct text sentiment.","PeriodicalId":517391,"journal":{"name":"Transportation Research Record: Journal of the Transportation Research Board","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automating Pedestrian Crash Typology Using Transformer Models\",\"authors\":\"Amir Hossein Oliaee, Subasish Das, Minh Le\",\"doi\":\"10.1177/03611981241260691\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To accurately analyze and understand the causes of traffic crashes involving pedestrians and bicyclists, the Pedestrian and Bicycle Crash Analysis Tool (PBCAT) was developed. However, manual data entry in the tool is labor intensive. Thus, a more automated method is needed for large data sets. This study developed deep-learning models to automate the classification of crash types. Additionally, the PBCAT’s classification typology can lead to imbalanced data sets, underscoring the need to actively tackle the issue of imbalanced native classification. By addressing this issue, researchers can significantly enhance their ability to harness the potential of emerging large language models. This endeavor becomes even more crucial as large language models like transformer models become increasingly accessible, offering promising opportunities in transportation safety research. This study focused on police reports’ text narratives concerning pedestrian crashes in three major cities in Texas from 2018 to 2020 as a case study. It evaluated the effectiveness of classification loss functions, classification typology adjustments, and model pre-training in addressing the adverse effects of data set imbalance. Our tests indicate that better classification results can be achieved by using the balanced categorical cross entropy (BCE) loss function and using a model with a more robust pre-training. This effect was noticeable when a large enough sample size was present for each class. In the case of smaller data sets, a tiered classification system was recommended, with fewer classes and more distinct text sentiment.\",\"PeriodicalId\":517391,\"journal\":{\"name\":\"Transportation Research Record: Journal of the Transportation Research Board\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Record: Journal of the Transportation Research Board\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/03611981241260691\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Record: Journal of the Transportation Research Board","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/03611981241260691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了准确分析和了解涉及行人和骑自行车者的交通事故原因，开发了行人和自行车事故分析工具 (PBCAT)。然而，在该工具中手动输入数据耗费大量人力。因此，需要一种更加自动化的方法来处理大型数据集。本研究开发了深度学习模型来自动分类碰撞类型。此外，PBCAT 的分类类型学可能会导致数据集的不平衡，因此需要积极解决本地分类不平衡的问题。通过解决这一问题，研究人员可以大大提高利用新兴大型语言模型潜力的能力。随着转换器模型等大型语言模型越来越容易获得，这一努力变得更加重要，为交通安全研究提供了大有可为的机会。本研究以 2018 年至 2020 年德克萨斯州三个主要城市的警方报告中有关行人碰撞事故的文本叙述为案例。它评估了分类损失函数、分类类型调整和模型预训练在解决数据集不平衡的不利影响方面的有效性。我们的测试表明，使用平衡分类交叉熵（BCE）损失函数和使用更稳健的预训练模型可以获得更好的分类结果。当每个类别的样本量足够大时，这种效果就会很明显。在数据集较小的情况下，建议采用分层分类系统，减少类别，增加不同的文本情感。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automating Pedestrian Crash Typology Using Transformer Models

To accurately analyze and understand the causes of traffic crashes involving pedestrians and bicyclists, the Pedestrian and Bicycle Crash Analysis Tool (PBCAT) was developed. However, manual data entry in the tool is labor intensive. Thus, a more automated method is needed for large data sets. This study developed deep-learning models to automate the classification of crash types. Additionally, the PBCAT’s classification typology can lead to imbalanced data sets, underscoring the need to actively tackle the issue of imbalanced native classification. By addressing this issue, researchers can significantly enhance their ability to harness the potential of emerging large language models. This endeavor becomes even more crucial as large language models like transformer models become increasingly accessible, offering promising opportunities in transportation safety research. This study focused on police reports’ text narratives concerning pedestrian crashes in three major cities in Texas from 2018 to 2020 as a case study. It evaluated the effectiveness of classification loss functions, classification typology adjustments, and model pre-training in addressing the adverse effects of data set imbalance. Our tests indicate that better classification results can be achieved by using the balanced categorical cross entropy (BCE) loss function and using a model with a more robust pre-training. This effect was noticeable when a large enough sample size was present for each class. In the case of smaller data sets, a tiered classification system was recommended, with fewer classes and more distinct text sentiment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transportation Research Record: Journal of the Transportation Research Board

自引率

0.00%

发文量