Vinayak Malaghan , Francesco Pilla , Pavlos Tafidis , Brian Rogers
{"title":"Behavioral insights and hotspot identification: Integrating natural language processing, machine learning and geospatial analyses of cyclist crashes","authors":"Vinayak Malaghan , Francesco Pilla , Pavlos Tafidis , Brian Rogers","doi":"10.1016/j.trf.2025.05.005","DOIUrl":null,"url":null,"abstract":"<div><div>In response to the rising trend in the promotion and adoption of cycling, ensuring cyclist safety is paramount. Understanding behavioural causes of crashes and identifying collision hotspots is important; however, the efforts are hindered by underreporting and limited data on all types of incidents, including near misses. Addressing these challenges, this study analyses text data reported on dedicated active travel collision platforms to categorize incidents and uncover behavioural patterns contributing to collisions. The reported text data is grouped into distinct themes applying Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, and clustering. Additionally, the advanced geospatial technique Getis-Ord Gi* statistic is computed to identify spatial clustering of collisions and categorize geographical regions as hotspots and cold spots. Key themes contributing to collisions are grouped as follows: ‘close pass incidents,’ ‘blocked bicycle lanes,’ ‘cyclist incidents on tram tracks,’ ‘roundabout incidents,’ ‘left turn incidents,’ ‘incidents between buses and cyclists,’ ‘incidents involving cyclists and trucks,’ ‘incidents related to traffic lights and pedestrian crossings,’ and ‘turning incidents at intersections.’ Moreover, the hotspots from these incidents are located at or near the intersections of regional roads in the Central Business District (CBD) and on the peripheral regional roads encapsulating the CBD in Dublin, Ireland. This study advances the state of the art by utilizing an alternative data source, ‘crash descriptions’ from cyclist crashes, through the application of innovative machine learning techniques and advanced geospatial analyses. The insights from the unique themes and identified hotspots enhance understanding of risky behaviours and their spatial distribution, contributing to ongoing efforts to foster a safer cycling environment.</div></div>","PeriodicalId":48355,"journal":{"name":"Transportation Research Part F-Traffic Psychology and Behaviour","volume":"113 ","pages":"Pages 452-480"},"PeriodicalIF":3.5000,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part F-Traffic Psychology and Behaviour","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S136984782500169X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHOLOGY, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
In response to the rising trend in the promotion and adoption of cycling, ensuring cyclist safety is paramount. Understanding behavioural causes of crashes and identifying collision hotspots is important; however, the efforts are hindered by underreporting and limited data on all types of incidents, including near misses. Addressing these challenges, this study analyses text data reported on dedicated active travel collision platforms to categorize incidents and uncover behavioural patterns contributing to collisions. The reported text data is grouped into distinct themes applying Term Frequency-Inverse Document Frequency (TF-IDF) vectorization, and clustering. Additionally, the advanced geospatial technique Getis-Ord Gi* statistic is computed to identify spatial clustering of collisions and categorize geographical regions as hotspots and cold spots. Key themes contributing to collisions are grouped as follows: ‘close pass incidents,’ ‘blocked bicycle lanes,’ ‘cyclist incidents on tram tracks,’ ‘roundabout incidents,’ ‘left turn incidents,’ ‘incidents between buses and cyclists,’ ‘incidents involving cyclists and trucks,’ ‘incidents related to traffic lights and pedestrian crossings,’ and ‘turning incidents at intersections.’ Moreover, the hotspots from these incidents are located at or near the intersections of regional roads in the Central Business District (CBD) and on the peripheral regional roads encapsulating the CBD in Dublin, Ireland. This study advances the state of the art by utilizing an alternative data source, ‘crash descriptions’ from cyclist crashes, through the application of innovative machine learning techniques and advanced geospatial analyses. The insights from the unique themes and identified hotspots enhance understanding of risky behaviours and their spatial distribution, contributing to ongoing efforts to foster a safer cycling environment.
期刊介绍:
Transportation Research Part F: Traffic Psychology and Behaviour focuses on the behavioural and psychological aspects of traffic and transport. The aim of the journal is to enhance theory development, improve the quality of empirical studies and to stimulate the application of research findings in practice. TRF provides a focus and a means of communication for the considerable amount of research activities that are now being carried out in this field. The journal provides a forum for transportation researchers, psychologists, ergonomists, engineers and policy-makers with an interest in traffic and transport psychology.