通过k近邻连接映射约简

2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC) Pub Date : 2018-10-01 DOI:10.1109/CYBERC.2018.00050

Srikanth Bethu, B. Babu, S. G. Rao, R. Florence

{"title":"通过k近邻连接映射约简","authors":"Srikanth Bethu, B. Babu, S. G. Rao, R. Florence","doi":"10.1109/CYBERC.2018.00050","DOIUrl":null,"url":null,"abstract":"Knowledge discovery and Data mining plays a major role in computational intensive tasks with high range of applications. With the increase of volume and dimension of data, the distributed features perform operations in a reasonable period. MapReduce programming is suitable for distributed large scale data processing that provides different ways of solutions to the same problem, that (one) has particular constraints and properties. In this paper, we give comparative analysis and its approaches for computing KNN on MapReduce[1] theoretically and experimental evaluation. Load balancing, accuracy and complexity are analyzed on each step of data preprocessing, data partitioning and computation. The experiment results in this are produced by using variety of datasets. Time and Space complexity are analyzed periodically on each dataset and gives new advantages and short comings that are discussed for each algorithm. Finally this paper can be used as a reference material to handle KNN [2] based problems in the idea of Mapreducing in Big Data.","PeriodicalId":282903,"journal":{"name":"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Map Reduce by K-Nearest Neighbor Joins\",\"authors\":\"Srikanth Bethu, B. Babu, S. G. Rao, R. Florence\",\"doi\":\"10.1109/CYBERC.2018.00050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Knowledge discovery and Data mining plays a major role in computational intensive tasks with high range of applications. With the increase of volume and dimension of data, the distributed features perform operations in a reasonable period. MapReduce programming is suitable for distributed large scale data processing that provides different ways of solutions to the same problem, that (one) has particular constraints and properties. In this paper, we give comparative analysis and its approaches for computing KNN on MapReduce[1] theoretically and experimental evaluation. Load balancing, accuracy and complexity are analyzed on each step of data preprocessing, data partitioning and computation. The experiment results in this are produced by using variety of datasets. Time and Space complexity are analyzed periodically on each dataset and gives new advantages and short comings that are discussed for each algorithm. Finally this paper can be used as a reference material to handle KNN [2] based problems in the idea of Mapreducing in Big Data.\",\"PeriodicalId\":282903,\"journal\":{\"name\":\"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CYBERC.2018.00050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CYBERC.2018.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

知识发现和数据挖掘在应用范围广的计算密集型任务中起着重要作用。随着数据量和维数的增加，分布式特征在合理的周期内执行操作。MapReduce编程适用于分布式大规模数据处理，它为相同的问题提供了不同的解决方案，并且(一个)具有特定的约束和属性。本文对MapReduce[1]上的KNN计算方法进行了理论分析和实验评价。分析了数据预处理、数据划分和计算各步骤的负载均衡、精度和复杂度。实验结果是通过使用不同的数据集得出的。对每个数据集的时间和空间复杂度进行了周期性分析，并给出了每个算法的优点和缺点。最后，本文可以作为在大数据中mapreduce思想下处理基于KNN[2]的问题的参考材料。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Map Reduce by K-Nearest Neighbor Joins

Knowledge discovery and Data mining plays a major role in computational intensive tasks with high range of applications. With the increase of volume and dimension of data, the distributed features perform operations in a reasonable period. MapReduce programming is suitable for distributed large scale data processing that provides different ways of solutions to the same problem, that (one) has particular constraints and properties. In this paper, we give comparative analysis and its approaches for computing KNN on MapReduce[1] theoretically and experimental evaluation. Load balancing, accuracy and complexity are analyzed on each step of data preprocessing, data partitioning and computation. The experiment results in this are produced by using variety of datasets. Time and Space complexity are analyzed periodically on each dataset and gives new advantages and short comings that are discussed for each algorithm. Finally this paper can be used as a reference material to handle KNN [2] based problems in the idea of Mapreducing in Big Data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)

自引率

0.00%

发文量