基于MPI的并行k均值聚类算法

2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming Pub Date : 2011-12-09 DOI:10.1109/PAAP.2011.17

Jing Zhang, Gongqing Wu, Xuegang Hu, Shiying Li, Shuilong Hao

{"title":"基于MPI的并行k均值聚类算法","authors":"Jing Zhang, Gongqing Wu, Xuegang Hu, Shiying Li, Shuilong Hao","doi":"10.1109/PAAP.2011.17","DOIUrl":null,"url":null,"abstract":"Clustering is one of the most popular methods for data analysis, which is prevalent in many disciplines such as image segmentation, bioinformatics, pattern recognition and statistics etc. The most popular and simplest clustering algorithm is K-means because of its easy implementation, simplicity, efficiency and empirical success. However, the real-world applications produce huge volumes of data, thus, how to efficiently handle of these data in an important mining task has been a challenging and significant issue. In addition, MPI (Message Passing Interface) as a programming model of message passing presents high performances, scalability and portability. Motivated by this, a parallel K-means clustering algorithm with MPI, called MKmeans, is proposed in this paper. The algorithm enables applying the clustering algorithm effectively in the parallel environment. Experimental study demonstrates that MKmeans is relatively stable and portable, and it performs with low overhead of time on large volumes of data sets.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"147 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"49","resultStr":"{\"title\":\"A Parallel K-Means Clustering Algorithm with MPI\",\"authors\":\"Jing Zhang, Gongqing Wu, Xuegang Hu, Shiying Li, Shuilong Hao\",\"doi\":\"10.1109/PAAP.2011.17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering is one of the most popular methods for data analysis, which is prevalent in many disciplines such as image segmentation, bioinformatics, pattern recognition and statistics etc. The most popular and simplest clustering algorithm is K-means because of its easy implementation, simplicity, efficiency and empirical success. However, the real-world applications produce huge volumes of data, thus, how to efficiently handle of these data in an important mining task has been a challenging and significant issue. In addition, MPI (Message Passing Interface) as a programming model of message passing presents high performances, scalability and portability. Motivated by this, a parallel K-means clustering algorithm with MPI, called MKmeans, is proposed in this paper. The algorithm enables applying the clustering algorithm effectively in the parallel environment. Experimental study demonstrates that MKmeans is relatively stable and portable, and it performs with low overhead of time on large volumes of data sets.\",\"PeriodicalId\":213010,\"journal\":{\"name\":\"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming\",\"volume\":\"147 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"49\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PAAP.2011.17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PAAP.2011.17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 49

摘要

聚类是目前最流行的数据分析方法之一，广泛应用于图像分割、生物信息学、模式识别和统计学等领域。最流行和最简单的聚类算法是K-means，因为它易于实现，简单，高效和经验成功。然而，现实世界的应用程序会产生大量的数据，因此，如何在重要的挖掘任务中有效地处理这些数据一直是一个具有挑战性和重要的问题。此外，MPI (Message Passing Interface)作为一种消息传递的编程模型，具有高性能、可扩展性和可移植性。基于此，本文提出了一种基于MPI的并行K-means聚类算法MKmeans。该算法使聚类算法能够有效地应用于并行环境。实验研究表明，MKmeans具有相对稳定和可移植性，并且在大容量数据集上运行的时间开销较低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Parallel K-Means Clustering Algorithm with MPI

Clustering is one of the most popular methods for data analysis, which is prevalent in many disciplines such as image segmentation, bioinformatics, pattern recognition and statistics etc. The most popular and simplest clustering algorithm is K-means because of its easy implementation, simplicity, efficiency and empirical success. However, the real-world applications produce huge volumes of data, thus, how to efficiently handle of these data in an important mining task has been a challenging and significant issue. In addition, MPI (Message Passing Interface) as a programming model of message passing presents high performances, scalability and portability. Motivated by this, a parallel K-means clustering algorithm with MPI, called MKmeans, is proposed in this paper. The algorithm enables applying the clustering algorithm effectively in the parallel environment. Experimental study demonstrates that MKmeans is relatively stable and portable, and it performs with low overhead of time on large volumes of data sets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming

自引率

0.00%

发文量