{"title":"Efficient Parallel Algorithms for k-Center Clustering","authors":"J. McClintock, Anthony Wirth","doi":"10.1109/ICPP.2016.22","DOIUrl":null,"url":null,"abstract":"The k-center problem is a classic NP-hard clustering question. For contemporary massive data sets, RAM-based algorithms become impractical. Although there exist good algorithms for k-center, they are all inherently sequential. In this paper, we design and implement parallel approximation algorithms for k-center. We observe that Gonzalez's greedy algorithm can be efficiently parallelized in several MapReduce rounds, in practice, we find that two rounds are sufficient, leading to a 4-approximation. In practice, we find this parallel scheme is about 100 times faster than the sequential Gonzalez algorithm, and barely compromises solution quality. We contrast this with an existing parallel algorithm for k-center that offers a 10-approximation. Our analysis reveals that this scheme is often slow, and that its sampling procedure only runs if k is sufficiently small, relative to input size. In practice, it is slightly more effective than Gonzalez's approach, but is slow. To trade off runtime for approximation guarantee, we parameterize this sampling algorithm. We prove a lower bound on the parameter for effectiveness, and find experimentally that with values even lower than the bound, the algorithm is not only faster, but sometimes more effective.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
The k-center problem is a classic NP-hard clustering question. For contemporary massive data sets, RAM-based algorithms become impractical. Although there exist good algorithms for k-center, they are all inherently sequential. In this paper, we design and implement parallel approximation algorithms for k-center. We observe that Gonzalez's greedy algorithm can be efficiently parallelized in several MapReduce rounds, in practice, we find that two rounds are sufficient, leading to a 4-approximation. In practice, we find this parallel scheme is about 100 times faster than the sequential Gonzalez algorithm, and barely compromises solution quality. We contrast this with an existing parallel algorithm for k-center that offers a 10-approximation. Our analysis reveals that this scheme is often slow, and that its sampling procedure only runs if k is sufficiently small, relative to input size. In practice, it is slightly more effective than Gonzalez's approach, but is slow. To trade off runtime for approximation guarantee, we parameterize this sampling algorithm. We prove a lower bound on the parameter for effectiveness, and find experimentally that with values even lower than the bound, the algorithm is not only faster, but sometimes more effective.