Yunfei Zhao, Kai Kang, Wenjian Jia, Zhe Guo, Jie Zhang, Tong Zhu
{"title":"Examining traffic violations in severe casualty truck crashes: A text mining and reliable network analysis of narrative reports.","authors":"Yunfei Zhao, Kai Kang, Wenjian Jia, Zhe Guo, Jie Zhang, Tong Zhu","doi":"10.1080/15389588.2025.2553194","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Trucks are more likely to be involved in severe casualty crashes compared with other vehicle types. The elimination of traffic violations is crucial to preventing severe casualty truck crashes. However, there is a lack of comprehensive analyses of truck violations and their conditions related to severe casualty crashes. This study aims to identify thematic communities of truck driver violations through a modeling framework integrating text mining and reliable network analysis.</p><p><strong>Methods: </strong>This study collected 432 textual reports of severe truck casualty crashes in China from 2013 to 2020, which were divided into crash narratives and metadata for separate preprocessing. For the narrative part, the ELECTRA model was used for Chinese word segmentation and part-of-speech tagging, and keywords were extracted by combining with TF-IDF. The metadata was processed through named entity recognition, geocoding, etc., and then merged with the narrative keywords. Association rules were mined by the Apriori algorithm to construct a network with keywords as nodes and lift values as edge weights, which was visualized by the ForceAtlas2 algorithm. The Leiden algorithm was adopted to detect thematic communities, whose significance was validated by QStest.</p><p><strong>Results: </strong>Text mining results reveal 77 most relevant keywords extracted from 432 police narratives. Overloading and speeding emerge as predominant traffic violations, correlating with 43% and 30% of severe casualty truck crashes, respectively. A total of four overloading and five speeding statistically significant thematic communities are identified. Notably, the circumstances associated with truck overloading and speeding manifest distinct characteristics. For overloading, conditions contributing to severe casualty crashes encompass rural highways with curves or slopes, provincial or national highways in the afternoon, expressways during nighttime, and locations proximate to signalized intersections. In contrast, five circumstances are linked to speeding: curved or sloped road segments during the afternoon, rural highways in autumn, straight road sections during the night, work zone areas on four-lane roadways, and un-signalized intersections on weekdays. Moreover, we also extracted vehicle and driver features across diverse environments, facilitating the identification of key elements for preventing severe casualty truck crashes. For instance, light trucks exhibit a higher susceptibility to severe casualty crashes attributed to overloading on rural highways.</p><p><strong>Conclusions: </strong>This study demonstrates the advantages of textual data and reliable network analysis. Text data analysis proves to be more convenient, yielding a richer array of comprehensive information while demanding less subjective judgment. The findings of this paper inform consequent enforcement and engineering measures for mitigating severe casualty truck crashes.</p>","PeriodicalId":54422,"journal":{"name":"Traffic Injury Prevention","volume":" ","pages":"1-10"},"PeriodicalIF":1.9000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Traffic Injury Prevention","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/15389588.2025.2553194","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Trucks are more likely to be involved in severe casualty crashes compared with other vehicle types. The elimination of traffic violations is crucial to preventing severe casualty truck crashes. However, there is a lack of comprehensive analyses of truck violations and their conditions related to severe casualty crashes. This study aims to identify thematic communities of truck driver violations through a modeling framework integrating text mining and reliable network analysis.
Methods: This study collected 432 textual reports of severe truck casualty crashes in China from 2013 to 2020, which were divided into crash narratives and metadata for separate preprocessing. For the narrative part, the ELECTRA model was used for Chinese word segmentation and part-of-speech tagging, and keywords were extracted by combining with TF-IDF. The metadata was processed through named entity recognition, geocoding, etc., and then merged with the narrative keywords. Association rules were mined by the Apriori algorithm to construct a network with keywords as nodes and lift values as edge weights, which was visualized by the ForceAtlas2 algorithm. The Leiden algorithm was adopted to detect thematic communities, whose significance was validated by QStest.
Results: Text mining results reveal 77 most relevant keywords extracted from 432 police narratives. Overloading and speeding emerge as predominant traffic violations, correlating with 43% and 30% of severe casualty truck crashes, respectively. A total of four overloading and five speeding statistically significant thematic communities are identified. Notably, the circumstances associated with truck overloading and speeding manifest distinct characteristics. For overloading, conditions contributing to severe casualty crashes encompass rural highways with curves or slopes, provincial or national highways in the afternoon, expressways during nighttime, and locations proximate to signalized intersections. In contrast, five circumstances are linked to speeding: curved or sloped road segments during the afternoon, rural highways in autumn, straight road sections during the night, work zone areas on four-lane roadways, and un-signalized intersections on weekdays. Moreover, we also extracted vehicle and driver features across diverse environments, facilitating the identification of key elements for preventing severe casualty truck crashes. For instance, light trucks exhibit a higher susceptibility to severe casualty crashes attributed to overloading on rural highways.
Conclusions: This study demonstrates the advantages of textual data and reliable network analysis. Text data analysis proves to be more convenient, yielding a richer array of comprehensive information while demanding less subjective judgment. The findings of this paper inform consequent enforcement and engineering measures for mitigating severe casualty truck crashes.
期刊介绍:
The purpose of Traffic Injury Prevention is to bridge the disciplines of medicine, engineering, public health and traffic safety in order to foster the science of traffic injury prevention. The archival journal focuses on research, interventions and evaluations within the areas of traffic safety, crash causation, injury prevention and treatment.
General topics within the journal''s scope are driver behavior, road infrastructure, emerging crash avoidance technologies, crash and injury epidemiology, alcohol and drugs, impact injury biomechanics, vehicle crashworthiness, occupant restraints, pedestrian safety, evaluation of interventions, economic consequences and emergency and clinical care with specific application to traffic injury prevention. The journal includes full length papers, review articles, case studies, brief technical notes and commentaries.