Qing He, Yunlong Ma, Qun Wang, Fuzhen Zhuang, Zhongzhi Shi
{"title":"Parallel Outlier Detection Using KD-Tree Based on MapReduce","authors":"Qing He, Yunlong Ma, Qun Wang, Fuzhen Zhuang, Zhongzhi Shi","doi":"10.1109/CloudCom.2011.20","DOIUrl":null,"url":null,"abstract":"Distributed and Parallel algorithms have attracted a vast amount of interest and research in recent decades, to handle large-scale data set in real-world applications. In this paper, we focus on a parallel implementation of KD-Tree based outlier detection method to deal with large-scale data set. As one of the state-of-the-art outlier detection methods, KD-Tree based has been approved to be an effective algorithm. However, it still cannot process large-scale data set efficiently due to its serial implementation. Based on the current and powerful parallel programming framework -- MapReduce, we propose to implement the parallel KD-Tree based outlier detection algorithm (e.g., PKDTree for short). Experimental results demonstrate the efficiency of PKDTree according to the evaluation criterions of scale up, speedup and size up.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2011.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Distributed and Parallel algorithms have attracted a vast amount of interest and research in recent decades, to handle large-scale data set in real-world applications. In this paper, we focus on a parallel implementation of KD-Tree based outlier detection method to deal with large-scale data set. As one of the state-of-the-art outlier detection methods, KD-Tree based has been approved to be an effective algorithm. However, it still cannot process large-scale data set efficiently due to its serial implementation. Based on the current and powerful parallel programming framework -- MapReduce, we propose to implement the parallel KD-Tree based outlier detection algorithm (e.g., PKDTree for short). Experimental results demonstrate the efficiency of PKDTree according to the evaluation criterions of scale up, speedup and size up.