A Sketch-Based Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams

Eugenio Cesario, Antonio Grillo, C. Mastroianni, D. Talia
{"title":"A Sketch-Based Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams","authors":"Eugenio Cesario, Antonio Grillo, C. Mastroianni, D. Talia","doi":"10.1109/CCGrid.2011.45","DOIUrl":null,"url":null,"abstract":"This paper presents the design and the implementation of an architecture for the analysis of data streams in distributed environments. In particular, data stream analysis has been carried out for the computation of items and item sets that exceed a frequency threshold. The mining approach is hybrid, that is, frequent items are calculated with a single pass, using a sketch algorithm, while frequent item sets are calculated by a further multi-pass analysis. The architecture combines parallel and distributed processing to keep the pace with the rate of distributed data streams. In order to keep computation close to data, miners are distributed among the domains where data streams are generated. The paper also reports the experimental results obtained with a prototype of the architecture, tested on a Grid composed of two domains handling two different data streams.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"373 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2011.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

This paper presents the design and the implementation of an architecture for the analysis of data streams in distributed environments. In particular, data stream analysis has been carried out for the computation of items and item sets that exceed a frequency threshold. The mining approach is hybrid, that is, frequent items are calculated with a single pass, using a sketch algorithm, while frequent item sets are calculated by a further multi-pass analysis. The architecture combines parallel and distributed processing to keep the pace with the rate of distributed data streams. In order to keep computation close to data, miners are distributed among the domains where data streams are generated. The paper also reports the experimental results obtained with a prototype of the architecture, tested on a Grid composed of two domains handling two different data streams.
基于草图的分布式数据流频繁项和项集挖掘体系结构
本文介绍了分布式环境下数据流分析体系结构的设计与实现。特别地,数据流分析已用于计算超过频率阈值的项和项集。挖掘方法是混合的,即使用草图算法通过单次遍历计算频繁项,而通过进一步的多遍历分析计算频繁项集。该体系结构结合了并行和分布式处理,以保持分布式数据流的速度。为了使计算接近数据,矿工被分布在产生数据流的域中。本文还报道了该体系结构原型在两个域组成的网格上处理两种不同数据流的实验结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信