A frequent itemset reduction algorithm for global pattern mining on distributed data streams

2017 Tenth International Conference on Contemporary Computing (IC3) Pub Date : 2017-08-01 DOI:10.1109/IC3.2017.8284320

Shalini, Sanjay Kumar Jain

引用次数: 1

Abstract

In present scenario, extracting global frequent itemsets from big data, distributed across multiple data streams, with its real time requirements is a complex problem. In this article, we propose an algorithm that reduces number of local frequent itemsets communicated to root node to extract global patterns from distributed multiple data streams. Here, the algorithm sends only local frequent itemsets to the root node instead of sending summary of local data streams. We compress sets of local frequent itemsets and send them to the root node using algorithm called Frequent Itemset Reduction (FIR) algorithm. We present two indexing structures known as I-list and Modified Seg-tree (MsegT) to store all local frequent itemsets at root node. Our experimental study exhibits that the FIR algorithm reduces communication cost in a good extent and MsegT produces substantial good results compared to I-list and few state-of-the-art techniques.

查看原文本刊更多论文

分布式数据流上全局模式挖掘的频繁项集约简算法

在目前的场景中，从分布在多个数据流中的大数据中提取全局频繁项集是一个非常复杂的问题。在本文中，我们提出了一种减少与根节点通信的局部频繁项集数量的算法，以从分布式多数据流中提取全局模式。这里，算法只向根节点发送本地频繁项集，而不发送本地数据流的摘要。我们使用频繁项集约简(FIR)算法压缩本地频繁项集集并将其发送到根节点。我们提出了I-list和Modified Seg-tree (MsegT)两种索引结构，将所有本地频繁项集存储在根节点。我们的实验研究表明，与I-list和一些最先进的技术相比，FIR算法在很大程度上降低了通信成本，MsegT产生了相当好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 Tenth International Conference on Contemporary Computing (IC3)

自引率

0.00%

发文量