{"title":"基于Mapreduce的大数据kNN-join查询处理算法","authors":"Hyunjo Lee, Jae-Woo Chang, Cheol-Joo Chae","doi":"10.1145/3459104.3459192","DOIUrl":null,"url":null,"abstract":"Recently, the amount of data is rapidly increasing with the continuous development of computation and communication capabilities. So, it has been actively studied for the effective data analysis schemes of the large amounts of data on MapReduce which supports efficient parallel data processing for large-scale data. Among various queries for analysing data, k nearest neighbour (kNN) join query, which aims to combine the k nearest neighbours of each point of dataset R with those from another dataset S, has been considered typical. However, existing kNN join schemes on MapReduce require high computation cost for constructing and managing index structures. To solve the problems, we propose a kNN-join query processing algorithm on MapReduce for analysing large-scale data. First, our algorithm can reduce the overhead for constructing the index structure by using the seed-based dynamic partitioning. Second, it can reduce the computational overhead to find candidate partitions by using the average distance between a pair of neighbouring seeds. We show that our algorithm outperforms the existing scheme in terms of the query processing time.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"kNN-join Query Processing Algorithm on Mapreduce for Large Amounts of Data\",\"authors\":\"Hyunjo Lee, Jae-Woo Chang, Cheol-Joo Chae\",\"doi\":\"10.1145/3459104.3459192\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, the amount of data is rapidly increasing with the continuous development of computation and communication capabilities. So, it has been actively studied for the effective data analysis schemes of the large amounts of data on MapReduce which supports efficient parallel data processing for large-scale data. Among various queries for analysing data, k nearest neighbour (kNN) join query, which aims to combine the k nearest neighbours of each point of dataset R with those from another dataset S, has been considered typical. However, existing kNN join schemes on MapReduce require high computation cost for constructing and managing index structures. To solve the problems, we propose a kNN-join query processing algorithm on MapReduce for analysing large-scale data. First, our algorithm can reduce the overhead for constructing the index structure by using the seed-based dynamic partitioning. Second, it can reduce the computational overhead to find candidate partitions by using the average distance between a pair of neighbouring seeds. We show that our algorithm outperforms the existing scheme in terms of the query processing time.\",\"PeriodicalId\":142284,\"journal\":{\"name\":\"2021 International Symposium on Electrical, Electronics and Information Engineering\",\"volume\":\"92 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Symposium on Electrical, Electronics and Information Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459104.3459192\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Electrical, Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459104.3459192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
kNN-join Query Processing Algorithm on Mapreduce for Large Amounts of Data
Recently, the amount of data is rapidly increasing with the continuous development of computation and communication capabilities. So, it has been actively studied for the effective data analysis schemes of the large amounts of data on MapReduce which supports efficient parallel data processing for large-scale data. Among various queries for analysing data, k nearest neighbour (kNN) join query, which aims to combine the k nearest neighbours of each point of dataset R with those from another dataset S, has been considered typical. However, existing kNN join schemes on MapReduce require high computation cost for constructing and managing index structures. To solve the problems, we propose a kNN-join query processing algorithm on MapReduce for analysing large-scale data. First, our algorithm can reduce the overhead for constructing the index structure by using the seed-based dynamic partitioning. Second, it can reduce the computational overhead to find candidate partitions by using the average distance between a pair of neighbouring seeds. We show that our algorithm outperforms the existing scheme in terms of the query processing time.