MapReduce-based frequent itemset mining for analysis of electronic evidence

2013 8th International Workshop on Systematic Approaches to Digital Forensics Engineering (SADFE) Pub Date : 2013-11-01 DOI:10.1109/SADFE.2013.6911549

Xueqing Jiang, Guozi Sun

{"title":"MapReduce-based frequent itemset mining for analysis of electronic evidence","authors":"Xueqing Jiang, Guozi Sun","doi":"10.1109/SADFE.2013.6911549","DOIUrl":null,"url":null,"abstract":"Association rules can mine the relevant evidence of computer crime from the massive data and association rules among data itemset, and further mine crime trends and connections among different crimes. They can help polices detect case and prevent crime with clues and criterions. Frequent itemset mining (FIM) plays a fundamental role in mining associations, correlations and many real-world data mining fields such as electronic evidence analysis area. FP-growth is the most famous FIM algorithm for discovering frequent patterns. As the data incrementing, the cost of time and space will be the bottleneck of FP-growth mining algorithms. One of the existing incremental frequent pattern mining algorithms called SPO-tree can perform incremental mining by a single scan for incremental mining. But how to apply this algorithm to the analysis of electronic evidence more effectively will become the focus of this paper. In the past research, little people take care of the item mined to the frequent item needing to update or inserted a little data. The past algorithms are not suit for this problem especially in forensic area. So, in this paper, we propose a novel parallelized algorithm called PISPO based on the cloud-computing framework MapReduce, which is widely used to cope with large-scale data and captures both the content and state to be distributed to the changed and original of the transactions dataset to SPO-tree.","PeriodicalId":287131,"journal":{"name":"2013 8th International Workshop on Systematic Approaches to Digital Forensics Engineering (SADFE)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th International Workshop on Systematic Approaches to Digital Forensics Engineering (SADFE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SADFE.2013.6911549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Association rules can mine the relevant evidence of computer crime from the massive data and association rules among data itemset, and further mine crime trends and connections among different crimes. They can help polices detect case and prevent crime with clues and criterions. Frequent itemset mining (FIM) plays a fundamental role in mining associations, correlations and many real-world data mining fields such as electronic evidence analysis area. FP-growth is the most famous FIM algorithm for discovering frequent patterns. As the data incrementing, the cost of time and space will be the bottleneck of FP-growth mining algorithms. One of the existing incremental frequent pattern mining algorithms called SPO-tree can perform incremental mining by a single scan for incremental mining. But how to apply this algorithm to the analysis of electronic evidence more effectively will become the focus of this paper. In the past research, little people take care of the item mined to the frequent item needing to update or inserted a little data. The past algorithms are not suit for this problem especially in forensic area. So, in this paper, we propose a novel parallelized algorithm called PISPO based on the cloud-computing framework MapReduce, which is widely used to cope with large-scale data and captures both the content and state to be distributed to the changed and original of the transactions dataset to SPO-tree.

查看原文本刊更多论文

基于mapreduce的频繁项集挖掘用于电子证据分析

关联规则可以从海量数据和数据项集之间的关联规则中挖掘出计算机犯罪的相关证据，进而挖掘出犯罪趋势和不同犯罪之间的联系。他们可以帮助警察侦破案件和预防犯罪的线索和标准。频繁项集挖掘(FIM)在关联挖掘、相关性挖掘以及电子证据分析等现实数据挖掘领域发挥着重要作用。FP-growth是发现频繁模式的最著名的FIM算法。随着数据的增加，时间和空间成本将成为fp增长挖掘算法的瓶颈。现有的增量频繁模式挖掘算法之一SPO-tree可以通过一次扫描来执行增量挖掘。但如何将该算法更有效地应用于电子证据分析将成为本文研究的重点。在过去的研究中，很少有人负责挖掘到频繁需要更新或插入少量数据的项目。过去的算法并不适合这一问题，特别是在法医领域。因此，本文提出了一种基于云计算框架MapReduce的新型并行化算法PISPO，该算法被广泛用于处理大规模数据，并将事务数据集的内容和状态同时捕获到spo树中分发给更改的和原始的事务数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 8th International Workshop on Systematic Approaches to Digital Forensics Engineering (SADFE)

自引率

0.00%

发文量