{"title":"A Parallel FP-Growth Mining Algorithm with Load Balancing Constraints for Traffic Crash Data","authors":"Yang Yang, Na Tian, Yunpeng Wang, Zhen-zhou Yuan","doi":"10.15837/ijccc.2022.4.4806","DOIUrl":null,"url":null,"abstract":"Traffic safety is an important part of the roadway in sustainable development. Freeway traffic crashes typically cause serious casualties and property losses, being a serious threat to public safety. Figuring out the potential correlation between various risk factors and revealing their coupling mechanisms are of effective ways to explore and identity freeway crash causes. However, the existing association rule mining algorithms still have some limitations in both efficiency and accuracy. Based on this consideration, using the freeway traffic crash data obtained from WDOT (Washington Department of Transportation), this research constructed a multi-dimensional multilevel system for traffic crash analysis. Considering the load balancing, the FP-Growth (Frequent Pattern- Growth) algorithm was optimized parallelly based on Hadoop platform, to achieve an efficient and accurate association rule mining calculation for massive amounts of traffic crash data; then, according to the results of the coupling mechanism among the crash precursors, the causes of freeway traffic crashes were identified and revealed. The results show that the parallel FPgrowth algorithm with load balancing constraints has a better operating speed than both the conventional FP-growth algorithm and parallel FP-growth algorithm towards processing big data. This improved algorithm makes full use of Hadoop cluster resources and is more suitable for large traffic crash data sets mining while retaining the original advantages of conventional association rule mining algorithm. In addition, the mining association rules model with the improvement of multi-dimensional interaction proposed in this research can catch the occurrence mechanism of freeway traffic crash with serious consequences (lower support degree probably) accurately and efficiently.","PeriodicalId":179619,"journal":{"name":"Int. J. Comput. Commun. Control","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Commun. Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15837/ijccc.2022.4.4806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Traffic safety is an important part of the roadway in sustainable development. Freeway traffic crashes typically cause serious casualties and property losses, being a serious threat to public safety. Figuring out the potential correlation between various risk factors and revealing their coupling mechanisms are of effective ways to explore and identity freeway crash causes. However, the existing association rule mining algorithms still have some limitations in both efficiency and accuracy. Based on this consideration, using the freeway traffic crash data obtained from WDOT (Washington Department of Transportation), this research constructed a multi-dimensional multilevel system for traffic crash analysis. Considering the load balancing, the FP-Growth (Frequent Pattern- Growth) algorithm was optimized parallelly based on Hadoop platform, to achieve an efficient and accurate association rule mining calculation for massive amounts of traffic crash data; then, according to the results of the coupling mechanism among the crash precursors, the causes of freeway traffic crashes were identified and revealed. The results show that the parallel FPgrowth algorithm with load balancing constraints has a better operating speed than both the conventional FP-growth algorithm and parallel FP-growth algorithm towards processing big data. This improved algorithm makes full use of Hadoop cluster resources and is more suitable for large traffic crash data sets mining while retaining the original advantages of conventional association rule mining algorithm. In addition, the mining association rules model with the improvement of multi-dimensional interaction proposed in this research can catch the occurrence mechanism of freeway traffic crash with serious consequences (lower support degree probably) accurately and efficiently.
交通安全是道路可持续发展的重要组成部分。高速公路交通事故通常会造成严重的人员伤亡和财产损失,对公共安全构成严重威胁。找出各种危险因素之间的潜在关联,揭示其耦合机制,是探索和识别高速公路碰撞原因的有效途径。然而,现有的关联规则挖掘算法在效率和准确性上都存在一定的局限性。基于此,本研究利用WDOT (Washington Department of Transportation)获取的高速公路交通碰撞数据,构建了一个多维多层次的交通碰撞分析系统。考虑到负载均衡,基于Hadoop平台并行优化FP-Growth (frequency Pattern- Growth)算法,实现对海量流量崩溃数据高效、准确的关联规则挖掘计算;然后,根据碰撞前兆之间耦合机制的结果,识别并揭示高速公路交通碰撞的原因。结果表明,负载均衡约束下的并行FP-growth算法在处理大数据方面比传统的FP-growth算法和并行FP-growth算法都有更好的运算速度。该改进算法充分利用Hadoop集群资源,在保留传统关联规则挖掘算法原有优势的同时,更适合于大型交通崩溃数据集挖掘。此外,本研究提出的改进多维交互的关联规则挖掘模型能够准确、高效地捕捉后果严重(可能较低支撑度)的高速公路交通碰撞的发生机制。