Range-Based Clustering Supporting Similarity Search in Big Data

T. Phan, Markus Jäger, Stefan Nadschläger, J. Küng
{"title":"Range-Based Clustering Supporting Similarity Search in Big Data","authors":"T. Phan, Markus Jäger, Stefan Nadschläger, J. Küng","doi":"10.1109/DEXA.2015.41","DOIUrl":null,"url":null,"abstract":"Thanks to state-of-the-art technologies, we have more and more modern infrastructures as well as automatic processes supporting the agricultural domain. Data collected from parcels by these systems and remote sensors for further analysis result in facing the three main challenges which are known as big volume, big variety, and big velocity, in the era of big data. In terms of similarity search, we propose a range-based clustering method that finds objects which are the most similar compared to the given object in a large-scale computing with Map Reduce. The proposed method groups objects into different clusters which are considered as pivots to perform pre-checking before computing similarity. Furthermore, we conduct some basic experiments to evaluate the performance of the proposed method and observe the influences of the clusters in similarity search.","PeriodicalId":239815,"journal":{"name":"2015 26th International Workshop on Database and Expert Systems Applications (DEXA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 26th International Workshop on Database and Expert Systems Applications (DEXA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEXA.2015.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Thanks to state-of-the-art technologies, we have more and more modern infrastructures as well as automatic processes supporting the agricultural domain. Data collected from parcels by these systems and remote sensors for further analysis result in facing the three main challenges which are known as big volume, big variety, and big velocity, in the era of big data. In terms of similarity search, we propose a range-based clustering method that finds objects which are the most similar compared to the given object in a large-scale computing with Map Reduce. The proposed method groups objects into different clusters which are considered as pivots to perform pre-checking before computing similarity. Furthermore, we conduct some basic experiments to evaluate the performance of the proposed method and observe the influences of the clusters in similarity search.
支持大数据相似性搜索的基于范围的聚类
得益于最先进的技术,我们拥有越来越多的现代化基础设施以及支持农业领域的自动化流程。这些系统和遥感器从包裹中收集数据进行进一步分析,导致在大数据时代面临着大容量、大品种、大速度三大挑战。在相似性搜索方面,我们提出了一种基于范围的聚类方法,该方法可以在Map Reduce的大规模计算中找到与给定对象最相似的对象。该方法将目标分组到不同的聚类中,以聚类为支点,在计算相似度前进行预检查。此外,我们还进行了一些基础实验来评估该方法的性能,并观察了聚类对相似性搜索的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信