{"title":"SOM Clustering Using Spark-MapReduce","authors":"Tugdual Sarazin, Hanene Azzag, M. Lebbah","doi":"10.1109/IPDPSW.2014.192","DOIUrl":null,"url":null,"abstract":"In this paper, we consider designing clustering algorithms that can be used in MapReduce using Spark platform, one of the most popular programming environment for processing large datasets. We focus on the practical and popular serial Self-organizing Map clustering algorithm (SOM). SOM is one of the famous unsupervised learning algorithms and it's useful for cluster analysis of large quantities of data. We have designed two scalable implementations of SOM-MapReduce algorithm. We report the experiments and demonstrated the performance in terms of classification accuracy, rand, speedup using real and synthetic data with 100 millions of points, using different cores.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"299 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
Abstract
In this paper, we consider designing clustering algorithms that can be used in MapReduce using Spark platform, one of the most popular programming environment for processing large datasets. We focus on the practical and popular serial Self-organizing Map clustering algorithm (SOM). SOM is one of the famous unsupervised learning algorithms and it's useful for cluster analysis of large quantities of data. We have designed two scalable implementations of SOM-MapReduce algorithm. We report the experiments and demonstrated the performance in terms of classification accuracy, rand, speedup using real and synthetic data with 100 millions of points, using different cores.