基于MapReduce的并行层次关联传播

2014 IEEE International Conference on Cloud Engineering Pub Date : 2013-12-02 DOI:10.1109/CloudCom.2013.97

D. Rose, J. Rouly, Rana Haber, N. Mijatovic, A. Peter

{"title":"基于MapReduce的并行层次关联传播","authors":"D. Rose, J. Rouly, Rana Haber, N. Mijatovic, A. Peter","doi":"10.1109/CloudCom.2013.97","DOIUrl":null,"url":null,"abstract":"The accelerated evolution and explosion of the Internet and social media is generating voluminous quantities of data (on zettabyte scales). Paramount amongst the desires to manipulate and extract actionable intelligence from vast big data volumes is the need for scalable, performance-conscious analytics algorithms. To directly address this need, we propose a novel MapReduce implementation of the exemplar-based clustering algorithm known as Affinity Propagation. Our parallelization strategy extends to the multilevel Hierarchical Affinity Propagation algorithm and enables tiered aggregation of unstructured data with minimal free parameters, in principle requiring only a similarity measure between data points. We detail the linear run-time complexity of our approach, overcoming the limiting quadratic complexity of the original algorithm. Experimental validation of our clustering methodology on a variety of synthetic and real data sets (e.g. images and point data) demonstrates our competitiveness against other state-of-the-art MapReduce clustering techniques.","PeriodicalId":273902,"journal":{"name":"2014 IEEE International Conference on Cloud Engineering","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Parallel Hierarchical Affinity Propagation with MapReduce\",\"authors\":\"D. Rose, J. Rouly, Rana Haber, N. Mijatovic, A. Peter\",\"doi\":\"10.1109/CloudCom.2013.97\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The accelerated evolution and explosion of the Internet and social media is generating voluminous quantities of data (on zettabyte scales). Paramount amongst the desires to manipulate and extract actionable intelligence from vast big data volumes is the need for scalable, performance-conscious analytics algorithms. To directly address this need, we propose a novel MapReduce implementation of the exemplar-based clustering algorithm known as Affinity Propagation. Our parallelization strategy extends to the multilevel Hierarchical Affinity Propagation algorithm and enables tiered aggregation of unstructured data with minimal free parameters, in principle requiring only a similarity measure between data points. We detail the linear run-time complexity of our approach, overcoming the limiting quadratic complexity of the original algorithm. Experimental validation of our clustering methodology on a variety of synthetic and real data sets (e.g. images and point data) demonstrates our competitiveness against other state-of-the-art MapReduce clustering techniques.\",\"PeriodicalId\":273902,\"journal\":{\"name\":\"2014 IEEE International Conference on Cloud Engineering\",\"volume\":\"96 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Cloud Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CloudCom.2013.97\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Cloud Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2013.97","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

互联网和社交媒体的加速发展和爆炸正在产生大量的数据(以泽字节为单位)。在从海量数据中操纵和提取可操作情报的愿望中，最重要的是需要可扩展的、性能敏感的分析算法。为了直接解决这一需求，我们提出了一种新的基于范例的聚类算法的MapReduce实现，称为Affinity Propagation。我们的并行化策略扩展到多层分层亲和传播算法，并支持以最小的自由参数分层聚合非结构化数据，原则上只需要数据点之间的相似性度量。我们详细介绍了我们的方法的线性运行时复杂度，克服了原始算法的极限二次复杂度。我们的聚类方法在各种合成和真实数据集(例如图像和点数据)上的实验验证证明了我们与其他最先进的MapReduce聚类技术的竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Parallel Hierarchical Affinity Propagation with MapReduce

The accelerated evolution and explosion of the Internet and social media is generating voluminous quantities of data (on zettabyte scales). Paramount amongst the desires to manipulate and extract actionable intelligence from vast big data volumes is the need for scalable, performance-conscious analytics algorithms. To directly address this need, we propose a novel MapReduce implementation of the exemplar-based clustering algorithm known as Affinity Propagation. Our parallelization strategy extends to the multilevel Hierarchical Affinity Propagation algorithm and enables tiered aggregation of unstructured data with minimal free parameters, in principle requiring only a similarity measure between data points. We detail the linear run-time complexity of our approach, overcoming the limiting quadratic complexity of the original algorithm. Experimental validation of our clustering methodology on a variety of synthetic and real data sets (e.g. images and point data) demonstrates our competitiveness against other state-of-the-art MapReduce clustering techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE International Conference on Cloud Engineering

自引率

0.00%

发文量