A Sketch-Based Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI:10.1109/CCGrid.2011.45

Eugenio Cesario, Antonio Grillo, C. Mastroianni, D. Talia

引用次数: 10

Abstract

This paper presents the design and the implementation of an architecture for the analysis of data streams in distributed environments. In particular, data stream analysis has been carried out for the computation of items and item sets that exceed a frequency threshold. The mining approach is hybrid, that is, frequent items are calculated with a single pass, using a sketch algorithm, while frequent item sets are calculated by a further multi-pass analysis. The architecture combines parallel and distributed processing to keep the pace with the rate of distributed data streams. In order to keep computation close to data, miners are distributed among the domains where data streams are generated. The paper also reports the experimental results obtained with a prototype of the architecture, tested on a Grid composed of two domains handling two different data streams.

查看原文本刊更多论文

基于草图的分布式数据流频繁项和项集挖掘体系结构

本文介绍了分布式环境下数据流分析体系结构的设计与实现。特别地，数据流分析已用于计算超过频率阈值的项和项集。挖掘方法是混合的，即使用草图算法通过单次遍历计算频繁项，而通过进一步的多遍历分析计算频繁项集。该体系结构结合了并行和分布式处理，以保持分布式数据流的速度。为了使计算接近数据，矿工被分布在产生数据流的域中。本文还报道了该体系结构原型在两个域组成的网格上处理两种不同数据流的实验结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

自引率

0.00%

发文量