MapReduce-based frequent itemset mining for analysis of electronic evidence

Xueqing Jiang, Guozi Sun
{"title":"MapReduce-based frequent itemset mining for analysis of electronic evidence","authors":"Xueqing Jiang, Guozi Sun","doi":"10.1109/SADFE.2013.6911549","DOIUrl":null,"url":null,"abstract":"Association rules can mine the relevant evidence of computer crime from the massive data and association rules among data itemset, and further mine crime trends and connections among different crimes. They can help polices detect case and prevent crime with clues and criterions. Frequent itemset mining (FIM) plays a fundamental role in mining associations, correlations and many real-world data mining fields such as electronic evidence analysis area. FP-growth is the most famous FIM algorithm for discovering frequent patterns. As the data incrementing, the cost of time and space will be the bottleneck of FP-growth mining algorithms. One of the existing incremental frequent pattern mining algorithms called SPO-tree can perform incremental mining by a single scan for incremental mining. But how to apply this algorithm to the analysis of electronic evidence more effectively will become the focus of this paper. In the past research, little people take care of the item mined to the frequent item needing to update or inserted a little data. The past algorithms are not suit for this problem especially in forensic area. So, in this paper, we propose a novel parallelized algorithm called PISPO based on the cloud-computing framework MapReduce, which is widely used to cope with large-scale data and captures both the content and state to be distributed to the changed and original of the transactions dataset to SPO-tree.","PeriodicalId":287131,"journal":{"name":"2013 8th International Workshop on Systematic Approaches to Digital Forensics Engineering (SADFE)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th International Workshop on Systematic Approaches to Digital Forensics Engineering (SADFE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SADFE.2013.6911549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Association rules can mine the relevant evidence of computer crime from the massive data and association rules among data itemset, and further mine crime trends and connections among different crimes. They can help polices detect case and prevent crime with clues and criterions. Frequent itemset mining (FIM) plays a fundamental role in mining associations, correlations and many real-world data mining fields such as electronic evidence analysis area. FP-growth is the most famous FIM algorithm for discovering frequent patterns. As the data incrementing, the cost of time and space will be the bottleneck of FP-growth mining algorithms. One of the existing incremental frequent pattern mining algorithms called SPO-tree can perform incremental mining by a single scan for incremental mining. But how to apply this algorithm to the analysis of electronic evidence more effectively will become the focus of this paper. In the past research, little people take care of the item mined to the frequent item needing to update or inserted a little data. The past algorithms are not suit for this problem especially in forensic area. So, in this paper, we propose a novel parallelized algorithm called PISPO based on the cloud-computing framework MapReduce, which is widely used to cope with large-scale data and captures both the content and state to be distributed to the changed and original of the transactions dataset to SPO-tree.
基于mapreduce的频繁项集挖掘用于电子证据分析
关联规则可以从海量数据和数据项集之间的关联规则中挖掘出计算机犯罪的相关证据,进而挖掘出犯罪趋势和不同犯罪之间的联系。他们可以帮助警察侦破案件和预防犯罪的线索和标准。频繁项集挖掘(FIM)在关联挖掘、相关性挖掘以及电子证据分析等现实数据挖掘领域发挥着重要作用。FP-growth是发现频繁模式的最著名的FIM算法。随着数据的增加,时间和空间成本将成为fp增长挖掘算法的瓶颈。现有的增量频繁模式挖掘算法之一SPO-tree可以通过一次扫描来执行增量挖掘。但如何将该算法更有效地应用于电子证据分析将成为本文研究的重点。在过去的研究中,很少有人负责挖掘到频繁需要更新或插入少量数据的项目。过去的算法并不适合这一问题,特别是在法医领域。因此,本文提出了一种基于云计算框架MapReduce的新型并行化算法PISPO,该算法被广泛用于处理大规模数据,并将事务数据集的内容和状态同时捕获到spo树中分发给更改的和原始的事务数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信