Yuxuan Wang, Ruoxin Xiong, Hao Yu, Jie Bao, Zhao Yang
{"title":"A semantic embedding methodology for motor vehicle crash records: A case study of traffic safety in Manhattan Borough of New York City","authors":"Yuxuan Wang, Ruoxin Xiong, Hao Yu, Jie Bao, Zhao Yang","doi":"10.1080/19439962.2021.1994681","DOIUrl":null,"url":null,"abstract":"Abstract This study introduces a hybrid Latent Dirichlet Allocation (LDA) model to excavate hidden crash patterns from the large-scale crash dataset. External semantic descriptions have been attached to raw GPS coordinates of crash events. The K-means clustering algorithm is first applied to determine land use characteristics of crash points by grouping surrounding Points of Interests (POIs). Then, each crash record is transformed into a formalized label consisting of land use, Annual Average Daily Traffic (AADT), and time stamps, allowing the analysis of massive traffic crash data as document corpora. Finally, a data-driven modeling approach based on the LDA is conducted to discover hidden crash patterns from traffic crash records combining the external semantic information. The approach is verified using motor vehicle crash data in Manhattan County of New York City. The novel semantic analysis of crash records provides an effective method to investigate the hidden information in traffic crashes. Identifying spatial-temporal patterns on motor vehicle crashes would provide insights into underlying traffic behaviors for intelligent policy-making and resource allocation.","PeriodicalId":46672,"journal":{"name":"Journal of Transportation Safety & Security","volume":"8 1","pages":"1913 - 1933"},"PeriodicalIF":2.4000,"publicationDate":"2021-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Transportation Safety & Security","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/19439962.2021.1994681","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TRANSPORTATION","Score":null,"Total":0}
引用次数: 3
Abstract
Abstract This study introduces a hybrid Latent Dirichlet Allocation (LDA) model to excavate hidden crash patterns from the large-scale crash dataset. External semantic descriptions have been attached to raw GPS coordinates of crash events. The K-means clustering algorithm is first applied to determine land use characteristics of crash points by grouping surrounding Points of Interests (POIs). Then, each crash record is transformed into a formalized label consisting of land use, Annual Average Daily Traffic (AADT), and time stamps, allowing the analysis of massive traffic crash data as document corpora. Finally, a data-driven modeling approach based on the LDA is conducted to discover hidden crash patterns from traffic crash records combining the external semantic information. The approach is verified using motor vehicle crash data in Manhattan County of New York City. The novel semantic analysis of crash records provides an effective method to investigate the hidden information in traffic crashes. Identifying spatial-temporal patterns on motor vehicle crashes would provide insights into underlying traffic behaviors for intelligent policy-making and resource allocation.