Ilham Kusuma, M. A. Ma'sum, Novian Habibie, W. Jatmiko, H. Suhartanto
{"title":"Design of intelligent k-means based on spark for big data clustering","authors":"Ilham Kusuma, M. A. Ma'sum, Novian Habibie, W. Jatmiko, H. Suhartanto","doi":"10.1109/IWBIS.2016.7872895","DOIUrl":null,"url":null,"abstract":"The growth of data has bring us to the big data generation where the amount of data cannot be computed using conventional environment. There are a lot of computational environment that had been developed to compute big data, one of them is Hadoop that has Distributed File System and MapReduce framework. Spark is newly framework that can be combined with Hadoop and run on top of it. In this paper, we design intelligent k-means based on Spark for big data clustering. Our design is using batch of data instead using original Resilient Distributed Dataset (RDD). We compare our design with the implementation that using original RDD of data. Result of experiment shows that implementation using batch of data is faster than the implementation using original RDD.","PeriodicalId":193821,"journal":{"name":"2016 International Workshop on Big Data and Information Security (IWBIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Workshop on Big Data and Information Security (IWBIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWBIS.2016.7872895","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
The growth of data has bring us to the big data generation where the amount of data cannot be computed using conventional environment. There are a lot of computational environment that had been developed to compute big data, one of them is Hadoop that has Distributed File System and MapReduce framework. Spark is newly framework that can be combined with Hadoop and run on top of it. In this paper, we design intelligent k-means based on Spark for big data clustering. Our design is using batch of data instead using original Resilient Distributed Dataset (RDD). We compare our design with the implementation that using original RDD of data. Result of experiment shows that implementation using batch of data is faster than the implementation using original RDD.