支持大数据相似性搜索的基于范围的聚类

2015 26th International Workshop on Database and Expert Systems Applications (DEXA) Pub Date : 2015-09-01 DOI:10.1109/DEXA.2015.41

T. Phan, Markus Jäger, Stefan Nadschläger, J. Küng

{"title":"支持大数据相似性搜索的基于范围的聚类","authors":"T. Phan, Markus Jäger, Stefan Nadschläger, J. Küng","doi":"10.1109/DEXA.2015.41","DOIUrl":null,"url":null,"abstract":"Thanks to state-of-the-art technologies, we have more and more modern infrastructures as well as automatic processes supporting the agricultural domain. Data collected from parcels by these systems and remote sensors for further analysis result in facing the three main challenges which are known as big volume, big variety, and big velocity, in the era of big data. In terms of similarity search, we propose a range-based clustering method that finds objects which are the most similar compared to the given object in a large-scale computing with Map Reduce. The proposed method groups objects into different clusters which are considered as pivots to perform pre-checking before computing similarity. Furthermore, we conduct some basic experiments to evaluate the performance of the proposed method and observe the influences of the clusters in similarity search.","PeriodicalId":239815,"journal":{"name":"2015 26th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Range-Based Clustering Supporting Similarity Search in Big Data\",\"authors\":\"T. Phan, Markus Jäger, Stefan Nadschläger, J. Küng\",\"doi\":\"10.1109/DEXA.2015.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Thanks to state-of-the-art technologies, we have more and more modern infrastructures as well as automatic processes supporting the agricultural domain. Data collected from parcels by these systems and remote sensors for further analysis result in facing the three main challenges which are known as big volume, big variety, and big velocity, in the era of big data. In terms of similarity search, we propose a range-based clustering method that finds objects which are the most similar compared to the given object in a large-scale computing with Map Reduce. The proposed method groups objects into different clusters which are considered as pivots to perform pre-checking before computing similarity. Furthermore, we conduct some basic experiments to evaluate the performance of the proposed method and observe the influences of the clusters in similarity search.\",\"PeriodicalId\":239815,\"journal\":{\"name\":\"2015 26th International Workshop on Database and Expert Systems Applications (DEXA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 26th International Workshop on Database and Expert Systems Applications (DEXA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEXA.2015.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 26th International Workshop on Database and Expert Systems Applications (DEXA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2015.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

得益于最先进的技术，我们拥有越来越多的现代化基础设施以及支持农业领域的自动化流程。这些系统和遥感器从包裹中收集数据进行进一步分析，导致在大数据时代面临着大容量、大品种、大速度三大挑战。在相似性搜索方面，我们提出了一种基于范围的聚类方法，该方法可以在Map Reduce的大规模计算中找到与给定对象最相似的对象。该方法将目标分组到不同的聚类中，以聚类为支点，在计算相似度前进行预检查。此外，我们还进行了一些基础实验来评估该方法的性能，并观察了聚类对相似性搜索的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Range-Based Clustering Supporting Similarity Search in Big Data

Thanks to state-of-the-art technologies, we have more and more modern infrastructures as well as automatic processes supporting the agricultural domain. Data collected from parcels by these systems and remote sensors for further analysis result in facing the three main challenges which are known as big volume, big variety, and big velocity, in the era of big data. In terms of similarity search, we propose a range-based clustering method that finds objects which are the most similar compared to the given object in a large-scale computing with Map Reduce. The proposed method groups objects into different clusters which are considered as pivots to perform pre-checking before computing similarity. Furthermore, we conduct some basic experiments to evaluate the performance of the proposed method and observe the influences of the clusters in similarity search.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 26th International Workshop on Database and Expert Systems Applications (DEXA)

自引率

0.00%

发文量