在交通事故数据分析中应用两步聚类算法

Le Khanh Giang, Huong Ho Thi Lan, Do Van Manh, Tran Quang Hoc
{"title":"在交通事故数据分析中应用两步聚类算法","authors":"Le Khanh Giang, Huong Ho Thi Lan, Do Van Manh, Tran Quang Hoc","doi":"10.47869/tcsj.75.4.16","DOIUrl":null,"url":null,"abstract":"Cluster analysis is often employed as the initial stage in organizing heterogeneous data into homogeneous groups. Choosing an effective clustering approach and an ideal number of clusters in a traffic accident dataset might be complex and challenging. This study aims to evaluate the effectiveness of k-means and two-step cluster methods. Subsequently, the two-step cluster method and GIS are applied to analyze the traffic accident datasets from 2015 to 2017 in Hanoi, Vietnam. First, according to the Silhouette score, the two-step cluster method achieved a higher score of 0.563, while the k-means method scored 0.341. A higher Silhouette score indicates more well-defined clusters. Second, the research suggests combining the two-step cluster method with GIS for analyzing traffic accident datasets. The outcome identifies five typical types of accidents in Hanoi. In addition, the locations of various accident types were visually illustrated on a map, enabling traffic officials to recommend precise and urgent countermeasures. Importantly, the clustering results reveal that the two-step cluster method exhibits a significantly higher rate of homogeneous data in the clusters compared to the k-means method. This study demonstrates that the two-step cluster method is not only more effective than the k-means method in terms of clustering ability but also in data pre-processing. The study's results enable authorities to gain a more detailed understanding of typical traffic accident patterns in Hanoi. Besides, the employed methods could potentially be applied to other regions, providing an additional avenue for analysis","PeriodicalId":235443,"journal":{"name":"Transport and Communications Science Journal","volume":"32 8","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Applying a two-step cluster algorithm in traffic accident data analysis\",\"authors\":\"Le Khanh Giang, Huong Ho Thi Lan, Do Van Manh, Tran Quang Hoc\",\"doi\":\"10.47869/tcsj.75.4.16\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cluster analysis is often employed as the initial stage in organizing heterogeneous data into homogeneous groups. Choosing an effective clustering approach and an ideal number of clusters in a traffic accident dataset might be complex and challenging. This study aims to evaluate the effectiveness of k-means and two-step cluster methods. Subsequently, the two-step cluster method and GIS are applied to analyze the traffic accident datasets from 2015 to 2017 in Hanoi, Vietnam. First, according to the Silhouette score, the two-step cluster method achieved a higher score of 0.563, while the k-means method scored 0.341. A higher Silhouette score indicates more well-defined clusters. Second, the research suggests combining the two-step cluster method with GIS for analyzing traffic accident datasets. The outcome identifies five typical types of accidents in Hanoi. In addition, the locations of various accident types were visually illustrated on a map, enabling traffic officials to recommend precise and urgent countermeasures. Importantly, the clustering results reveal that the two-step cluster method exhibits a significantly higher rate of homogeneous data in the clusters compared to the k-means method. This study demonstrates that the two-step cluster method is not only more effective than the k-means method in terms of clustering ability but also in data pre-processing. The study's results enable authorities to gain a more detailed understanding of typical traffic accident patterns in Hanoi. Besides, the employed methods could potentially be applied to other regions, providing an additional avenue for analysis\",\"PeriodicalId\":235443,\"journal\":{\"name\":\"Transport and Communications Science Journal\",\"volume\":\"32 8\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transport and Communications Science Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.47869/tcsj.75.4.16\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transport and Communications Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47869/tcsj.75.4.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

聚类分析通常被用作将异质数据组织成同质组的初始阶段。在交通事故数据集中选择一种有效的聚类方法和理想的聚类数目可能是复杂和具有挑战性的。本研究旨在评估 k-means 和两步聚类法的有效性。随后,将两步聚类法和 GIS 应用于分析越南河内市 2015 年至 2017 年的交通事故数据集。首先,根据 Silhouette 分数,两步聚类法取得了 0.563 的较高分数,而 k-means 法的分数为 0.341。Silhouette 分数越高,说明聚类越清晰。其次,研究建议将两步聚类法与地理信息系统相结合,用于分析交通事故数据集。研究结果确定了河内的五种典型事故类型。此外,还在地图上直观地显示了各种事故类型的位置,使交通官员能够提出准确而紧急的应对措施。重要的是,聚类结果显示,与 k-means 方法相比,两步聚类法的聚类数据同质性明显更高。这项研究表明,两步聚类法不仅在聚类能力方面比 k-means 法更有效,而且在数据预处理方面也更有效。研究结果有助于有关部门更详细地了解河内典型的交通事故模式。此外,所采用的方法还可应用于其他地区,为分析提供了新的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Applying a two-step cluster algorithm in traffic accident data analysis
Cluster analysis is often employed as the initial stage in organizing heterogeneous data into homogeneous groups. Choosing an effective clustering approach and an ideal number of clusters in a traffic accident dataset might be complex and challenging. This study aims to evaluate the effectiveness of k-means and two-step cluster methods. Subsequently, the two-step cluster method and GIS are applied to analyze the traffic accident datasets from 2015 to 2017 in Hanoi, Vietnam. First, according to the Silhouette score, the two-step cluster method achieved a higher score of 0.563, while the k-means method scored 0.341. A higher Silhouette score indicates more well-defined clusters. Second, the research suggests combining the two-step cluster method with GIS for analyzing traffic accident datasets. The outcome identifies five typical types of accidents in Hanoi. In addition, the locations of various accident types were visually illustrated on a map, enabling traffic officials to recommend precise and urgent countermeasures. Importantly, the clustering results reveal that the two-step cluster method exhibits a significantly higher rate of homogeneous data in the clusters compared to the k-means method. This study demonstrates that the two-step cluster method is not only more effective than the k-means method in terms of clustering ability but also in data pre-processing. The study's results enable authorities to gain a more detailed understanding of typical traffic accident patterns in Hanoi. Besides, the employed methods could potentially be applied to other regions, providing an additional avenue for analysis
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信