2009 Second International Workshop on Similarity Search and Applications最新文献

筛选
英文 中文
Efficient Similarity Search by Reducing I/O with Compressed Sketches 压缩草图减少I/O的高效相似性搜索
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.22
Arnoldo José Müller Molina, T. Shinohara
{"title":"Efficient Similarity Search by Reducing I/O with Compressed Sketches","authors":"Arnoldo José Müller Molina, T. Shinohara","doi":"10.1109/SISAP.2009.22","DOIUrl":"https://doi.org/10.1109/SISAP.2009.22","url":null,"abstract":"Sketches are compact bit string representations of objects. Objects that have the same sketch are stored in the same database bucket. By calculating the hamming distance of the sketches, an estimation of the similarity of their respective objects can be obtained. Objects that are close to each other are expected to have sketches with small hamming distance values. This estimation helps to schedule the order in which buckets are visited during search time. Recent research has shown that sketches can effectively approximate $L_1$ and $L_2$ distances in high dimensional settings. A remaining task is to provide a general sketch for arbitrary metric spaces. This paper presents a novel sketch based on generalized hyperplane partitioning that can be employed on arbitrary metric spaces. The core of the sketch is a heuristic that tries to generate balanced partitions. The indexing method AESA stores all the distances among database objects, and this allows it to perform a small number of distance computations. Experimental evaluations show that given a good early termination strategy, our algorithm performs up to one order of magnitude fewer distance operations than AESA in string spaces. Comparisons against other methods show greater gains. Furthermore, we experimentally demonstrate that it is possible to reduce the physical size of the sketches by a factor of ten with different run length encodings.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132410473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Optimal Pivots to Minimize the Index Size for Metric Access Methods 最小化度量访问方法索引大小的最优枢轴
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.21
Luis González Ares, N. Brisaboa, María F. Esteller, Oscar Pedreira, Á. Places
{"title":"Optimal Pivots to Minimize the Index Size for Metric Access Methods","authors":"Luis González Ares, N. Brisaboa, María F. Esteller, Oscar Pedreira, Á. Places","doi":"10.1109/SISAP.2009.21","DOIUrl":"https://doi.org/10.1109/SISAP.2009.21","url":null,"abstract":"We consider the problem of similarity search in metric spaces with costly distance functions and large databases. There is a trade-off between the amount of information stored in the index and the reduction in the number of comparisons for solving a query. Pivot-based methods clearly outperform clustering-based ones in number of comparisons, but their space requirements are higher and this can prevent their application in real problems. Therefore, several strategies have been proposed that reduce the space needed by pivot-based methods, as BAESA, FQA or KVP. In this paper, we analyze the usefulness of pivots depending on their proximity to the object. As consequence of this analysis, we propose a new pivot-based method that requires an amount of space equal or very close to that needed by clustering-based methods. We provide experimental results that show that our proposal represents a competitive strategy to clustering oriented solutions when using the same amount of memory.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133851124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Metric Index: An Efficient and Scalable Solution for Similarity Search 度量索引:一种高效、可扩展的相似度搜索解决方案
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.26
David Novak, Michal Batko
{"title":"Metric Index: An Efficient and Scalable Solution for Similarity Search","authors":"David Novak, Michal Batko","doi":"10.1109/SISAP.2009.26","DOIUrl":"https://doi.org/10.1109/SISAP.2009.26","url":null,"abstract":"Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index), that employs practically all known principles of metric space partitioning, pruning and filtering. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in well-established structures such as the B+-tree or even in a distributed storage. We have implemented the M-Index with B+-tree and performed experiments on a combination of five MPEG-7 descriptors in a database of hundreds of thousands digital images. The experiments put under test several M-Index variants and compare them with two orthogonal approaches – the PM-Tree and the iDistance. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Furthermore, the M-Index demonstrates an excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient – maintaining practically constant response times while preserving a very high recall as the dataset grows.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121696262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
Text-Based and Content-Based Image Retrieval on Flickr: DEMO 基于文本和基于内容的Flickr图像检索:DEMO
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.30
J. M. Barrios, Diego Diaz-Espinoza, B. Bustos
{"title":"Text-Based and Content-Based Image Retrieval on Flickr: DEMO","authors":"J. M. Barrios, Diego Diaz-Espinoza, B. Bustos","doi":"10.1109/SISAP.2009.30","DOIUrl":"https://doi.org/10.1109/SISAP.2009.30","url":null,"abstract":"We present an image retrieval system based on a combined search of text and content. The idea is to use the text present in title, description, and tags of the images for improving the results obtained with a standard content-based search. The system contains two different user interfaces: a sidebar for the browser designed for end users, where the user must enter the Flickr URL that is visiting and the system retrieves similar images from the collection, and an advanced search designed for experienced users, where the distance functions and weights can be customized.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130638984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
EGNAT: A Fully Dynamic Metric Access Method for Secondary Memory 辅助存储器的全动态度量访问方法
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.20
Roberto Uribe, G. Navarro
{"title":"EGNAT: A Fully Dynamic Metric Access Method for Secondary Memory","authors":"Roberto Uribe, G. Navarro","doi":"10.1109/SISAP.2009.20","DOIUrl":"https://doi.org/10.1109/SISAP.2009.20","url":null,"abstract":"We introduce a novel metric space search data structure called EGNAT, which is fully dynamic and designed for secondary memory. The EGNAT is based on Brin's GNAT static index, and partitions the space according to hyperplanes. The EGNAT implements deletions using a novel technique dubbed Ghost Hyperplanes, which is of independent interest for other metric space indexes. We show experimentally that the EGNAT is competitive with the M-tree, the baseline for this scenario.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125475882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Principles of Information Filtering in Metric Spaces 度量空间中的信息过滤原理
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.11
P. Ciaccia, M. Patella
{"title":"Principles of Information Filtering in Metric Spaces","authors":"P. Ciaccia, M. Patella","doi":"10.1109/SISAP.2009.11","DOIUrl":"https://doi.org/10.1109/SISAP.2009.11","url":null,"abstract":"The traditional problem of similarity search requires to find, within a set of points, those that are closer to a query point $q$, according to a distance function $d$. In this paper we introduce the novel problem of metric filtering: in this scenario, each data point $x_i$ possesses its own distance function $d_i$ and the task is to find those points that are close enough, according to $d_i$, to a query point $q$. This minor difference in the problem formulation introduces a series of challenges from the point of view of efficient evaluation. We provide basic definitions and alternative pivot-based resolution strategies, presenting results from a preliminary experimentation that show how the proposed solutions are indeed effective in reducing evaluation costs.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130382940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analyzing Metric Space Indexes: What For? 分析度量空间指数:为什么?
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.17
G. Navarro
{"title":"Analyzing Metric Space Indexes: What For?","authors":"G. Navarro","doi":"10.1109/SISAP.2009.17","DOIUrl":"https://doi.org/10.1109/SISAP.2009.17","url":null,"abstract":"It has been a long way since the beginnings of metric space searching, where people coming from algorithmics tried to apply their background to this new paradigm, obtaining variable, but especially difficult to explain, success or lack of it. Since then, some has been learned about the specifics of the problem, in particular regarding key aspects such as the intrinsic dimensionality, that were not well understood in the beginning. The inclusion of those aspects in the picture has led to the most important developments in the area. Similarly, researchers have tried to apply asymptotic analysis concepts to understand and predict the performance of the data structures. Again, it was soon clear that this was insufficient, and that the characteristics of the metric space itself could not be neglected. Although some progress has been made on understanding concepts such as the curse of dimensionality, modern researchers seem to have given up in using asymptotic analysis. They rely on experiments, or at best in detailed cost models that are good predictors but do not explain why the data structures perform in the way they do. In this paper I will argue that this is a big loss. Even if the predictive capability of asymptotic analysis is poor, it constitutes a great tool to understand the algorithmic concepts behind the different data structures, and gives powerful hints in the design of new ones. I will exemplify my view by recollecting what is known on asymptotic analysis of metric indexes, and will add some new results.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128231818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Dynamic P2P Indexing and Search Based on Compact Clustering 基于紧凑聚类的动态P2P索引与搜索
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.32
Mauricio Marín, V. Gil-Costa, Cecilia Hernández
{"title":"Dynamic P2P Indexing and Search Based on Compact Clustering","authors":"Mauricio Marín, V. Gil-Costa, Cecilia Hernández","doi":"10.1109/SISAP.2009.32","DOIUrl":"https://doi.org/10.1109/SISAP.2009.32","url":null,"abstract":"We propose a strategy to perform query processing on P2P similarity search systems based on peers and super-peers. We show that by approximating global but resumed information about the indexed data in each peer, the average amount of computation and communication performed to solve range queries can be significantly reduced as compared to alternative state of the art strategies based on local indexing at peer level. We illustrate our technique by using an indexing method based on compact clustering.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"1 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114103362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
MUFIN: A Multi-feature Indexing Network MUFIN:一个多特征索引网络
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.24
Michal Batko, Vlastislav Dohnal, David Novak, J. Sedmidubský
{"title":"MUFIN: A Multi-feature Indexing Network","authors":"Michal Batko, Vlastislav Dohnal, David Novak, J. Sedmidubský","doi":"10.1109/SISAP.2009.24","DOIUrl":"https://doi.org/10.1109/SISAP.2009.24","url":null,"abstract":"It has become customary that practically any information can be in a digital form. However, searching for relevant information can be complicated because of: (1) the diversity of ways in which specific data can be sorted, compared, related, or classified, and (2) the exponentially increasing amount of digital data. Accordingly, a successful search engine should address problems of extensibility and scalability. The Multi-Feature Indexing Network (MUFIN) is a general purpose search engine that satisfies these requirements. The extensibility is ensured by adopting the metric space to model the similarity, so MUFIN can evaluate queries over a wide variety of data domains compared by metric distance functions. The scalability is achieved by utilizing the paradigm of structured peer-to-peer networks, where the computational workload of query execution is distributed over multiple independent peers which can work in parallel. We demonstrate these unique capabilities of MUFIN on a database of 100 million images indexed according to a combination of five MPEG-7 descriptors.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128995537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
CoPhIR Image Collection under the Microscope 显微镜下CoPhIR图像采集
2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.25
Michal Batko, Petra Budíková, David Novak
{"title":"CoPhIR Image Collection under the Microscope","authors":"Michal Batko, Petra Budíková, David Novak","doi":"10.1109/SISAP.2009.25","DOIUrl":"https://doi.org/10.1109/SISAP.2009.25","url":null,"abstract":"The Content-based Photo Image Retrieval (CoPhIR) dataset is the largest available database of digital images with corresponding visual descriptors. It contains five MPEG-7 global descriptors extracted from more than 106 million images from Flickr photo-sharing system. In this paper, we analyze this dataset focusing on 1) efficiency of similarity-based indexing and searching and on 2) expressiveness of combination of the descriptors with respect to subjective perception of visual similarity. We treat the descriptors as metric spaces and then combine them into a multi-metric space. We analyze distance distributions of individual descriptors, measure intrinsic dimensionality of these datasets and statistically evaluate correlation between these descriptors. Further, we use two methods to assess subjective accuracy and satisfaction of similarity retrieval based on a combination of descriptors that is recommended for CoPhIR, and we compare these results on databases of 10 and 100 million CoPhIR images. Finally, we suggest, explore and evaluate two approaches to improve the accuracy: 1) applying logarithms in order to weaken influence of a single descriptor contribution if it deviates from the rest, and 2) the possibility of categorization of the dataset and identifying visual characteristics important for individual categories.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128957123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信