{"title":"利用变换器模型实现行人碰撞类型学自动化","authors":"Amir Hossein Oliaee, Subasish Das, Minh Le","doi":"10.1177/03611981241260691","DOIUrl":null,"url":null,"abstract":"To accurately analyze and understand the causes of traffic crashes involving pedestrians and bicyclists, the Pedestrian and Bicycle Crash Analysis Tool (PBCAT) was developed. However, manual data entry in the tool is labor intensive. Thus, a more automated method is needed for large data sets. This study developed deep-learning models to automate the classification of crash types. Additionally, the PBCAT’s classification typology can lead to imbalanced data sets, underscoring the need to actively tackle the issue of imbalanced native classification. By addressing this issue, researchers can significantly enhance their ability to harness the potential of emerging large language models. This endeavor becomes even more crucial as large language models like transformer models become increasingly accessible, offering promising opportunities in transportation safety research. This study focused on police reports’ text narratives concerning pedestrian crashes in three major cities in Texas from 2018 to 2020 as a case study. It evaluated the effectiveness of classification loss functions, classification typology adjustments, and model pre-training in addressing the adverse effects of data set imbalance. Our tests indicate that better classification results can be achieved by using the balanced categorical cross entropy (BCE) loss function and using a model with a more robust pre-training. This effect was noticeable when a large enough sample size was present for each class. In the case of smaller data sets, a tiered classification system was recommended, with fewer classes and more distinct text sentiment.","PeriodicalId":517391,"journal":{"name":"Transportation Research Record: Journal of the Transportation Research Board","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Automating Pedestrian Crash Typology Using Transformer Models\",\"authors\":\"Amir Hossein Oliaee, Subasish Das, Minh Le\",\"doi\":\"10.1177/03611981241260691\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To accurately analyze and understand the causes of traffic crashes involving pedestrians and bicyclists, the Pedestrian and Bicycle Crash Analysis Tool (PBCAT) was developed. However, manual data entry in the tool is labor intensive. Thus, a more automated method is needed for large data sets. This study developed deep-learning models to automate the classification of crash types. Additionally, the PBCAT’s classification typology can lead to imbalanced data sets, underscoring the need to actively tackle the issue of imbalanced native classification. By addressing this issue, researchers can significantly enhance their ability to harness the potential of emerging large language models. This endeavor becomes even more crucial as large language models like transformer models become increasingly accessible, offering promising opportunities in transportation safety research. This study focused on police reports’ text narratives concerning pedestrian crashes in three major cities in Texas from 2018 to 2020 as a case study. It evaluated the effectiveness of classification loss functions, classification typology adjustments, and model pre-training in addressing the adverse effects of data set imbalance. Our tests indicate that better classification results can be achieved by using the balanced categorical cross entropy (BCE) loss function and using a model with a more robust pre-training. This effect was noticeable when a large enough sample size was present for each class. In the case of smaller data sets, a tiered classification system was recommended, with fewer classes and more distinct text sentiment.\",\"PeriodicalId\":517391,\"journal\":{\"name\":\"Transportation Research Record: Journal of the Transportation Research Board\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transportation Research Record: Journal of the Transportation Research Board\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/03611981241260691\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Record: Journal of the Transportation Research Board","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/03611981241260691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automating Pedestrian Crash Typology Using Transformer Models
To accurately analyze and understand the causes of traffic crashes involving pedestrians and bicyclists, the Pedestrian and Bicycle Crash Analysis Tool (PBCAT) was developed. However, manual data entry in the tool is labor intensive. Thus, a more automated method is needed for large data sets. This study developed deep-learning models to automate the classification of crash types. Additionally, the PBCAT’s classification typology can lead to imbalanced data sets, underscoring the need to actively tackle the issue of imbalanced native classification. By addressing this issue, researchers can significantly enhance their ability to harness the potential of emerging large language models. This endeavor becomes even more crucial as large language models like transformer models become increasingly accessible, offering promising opportunities in transportation safety research. This study focused on police reports’ text narratives concerning pedestrian crashes in three major cities in Texas from 2018 to 2020 as a case study. It evaluated the effectiveness of classification loss functions, classification typology adjustments, and model pre-training in addressing the adverse effects of data set imbalance. Our tests indicate that better classification results can be achieved by using the balanced categorical cross entropy (BCE) loss function and using a model with a more robust pre-training. This effect was noticeable when a large enough sample size was present for each class. In the case of smaller data sets, a tiered classification system was recommended, with fewer classes and more distinct text sentiment.