2009 Second International Workshop on Similarity Search and Applications最新文献

Efficient Similarity Search by Reducing I/O with Compressed Sketches 压缩草图减少I/O的高效相似性搜索

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.22

Arnoldo José Müller Molina, T. Shinohara

{"title":"Efficient Similarity Search by Reducing I/O with Compressed Sketches","authors":"Arnoldo José Müller Molina, T. Shinohara","doi":"10.1109/SISAP.2009.22","DOIUrl":"https://doi.org/10.1109/SISAP.2009.22","url":null,"abstract":"Sketches are compact bit string representations of objects. Objects that have the same sketch are stored in the same database bucket. By calculating the hamming distance of the sketches, an estimation of the similarity of their respective objects can be obtained. Objects that are close to each other are expected to have sketches with small hamming distance values. This estimation helps to schedule the order in which buckets are visited during search time. Recent research has shown that sketches can effectively approximate $L_1$ and $L_2$ distances in high dimensional settings. A remaining task is to provide a general sketch for arbitrary metric spaces. This paper presents a novel sketch based on generalized hyperplane partitioning that can be employed on arbitrary metric spaces. The core of the sketch is a heuristic that tries to generate balanced partitions. The indexing method AESA stores all the distances among database objects, and this allows it to perform a small number of distance computations. Experimental evaluations show that given a good early termination strategy, our algorithm performs up to one order of magnitude fewer distance operations than AESA in string spaces. Comparisons against other methods show greater gains. Furthermore, we experimentally demonstrate that it is possible to reduce the physical size of the sketches by a factor of ten with different run length encodings.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132410473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Optimal Pivots to Minimize the Index Size for Metric Access Methods 最小化度量访问方法索引大小的最优枢轴

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.21

Luis González Ares, N. Brisaboa, María F. Esteller, Oscar Pedreira, Á. Places

引用次数: 16

Metric Index: An Efficient and Scalable Solution for Similarity Search 度量索引:一种高效、可扩展的相似度搜索解决方案

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.26

David Novak, Michal Batko

{"title":"Metric Index: An Efficient and Scalable Solution for Similarity Search","authors":"David Novak, Michal Batko","doi":"10.1109/SISAP.2009.26","DOIUrl":"https://doi.org/10.1109/SISAP.2009.26","url":null,"abstract":"Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index), that employs practically all known principles of metric space partitioning, pruning and filtering. The heart of the M-Index is a general mapping mechanism that enables to actually store the data in well-established structures such as the B+-tree or even in a distributed storage. We have implemented the M-Index with B+-tree and performed experiments on a combination of five MPEG-7 descriptors in a database of hundreds of thousands digital images. The experiments put under test several M-Index variants and compare them with two orthogonal approaches – the PM-Tree and the iDistance. The trials show that the M-Index outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Furthermore, the M-Index demonstrates an excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient – maintaining practically constant response times while preserving a very high recall as the dataset grows.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121696262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 52

Text-Based and Content-Based Image Retrieval on Flickr: DEMO 基于文本和基于内容的Flickr图像检索:DEMO

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.30

J. M. Barrios, Diego Diaz-Espinoza, B. Bustos

引用次数: 26

EGNAT: A Fully Dynamic Metric Access Method for Secondary Memory 辅助存储器的全动态度量访问方法

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.20

Roberto Uribe, G. Navarro

引用次数: 13

Principles of Information Filtering in Metric Spaces 度量空间中的信息过滤原理

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.11

P. Ciaccia, M. Patella

引用次数: 2

Analyzing Metric Space Indexes: What For? 分析度量空间指数:为什么?

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.17

G. Navarro

{"title":"Analyzing Metric Space Indexes: What For?","authors":"G. Navarro","doi":"10.1109/SISAP.2009.17","DOIUrl":"https://doi.org/10.1109/SISAP.2009.17","url":null,"abstract":"It has been a long way since the beginnings of metric space searching, where people coming from algorithmics tried to apply their background to this new paradigm, obtaining variable, but especially difficult to explain, success or lack of it. Since then, some has been learned about the specifics of the problem, in particular regarding key aspects such as the intrinsic dimensionality, that were not well understood in the beginning. The inclusion of those aspects in the picture has led to the most important developments in the area. Similarly, researchers have tried to apply asymptotic analysis concepts to understand and predict the performance of the data structures. Again, it was soon clear that this was insufficient, and that the characteristics of the metric space itself could not be neglected. Although some progress has been made on understanding concepts such as the curse of dimensionality, modern researchers seem to have given up in using asymptotic analysis. They rely on experiments, or at best in detailed cost models that are good predictors but do not explain why the data structures perform in the way they do. In this paper I will argue that this is a big loss. Even if the predictive capability of asymptotic analysis is poor, it constitutes a great tool to understand the algorithmic concepts behind the different data structures, and gives powerful hints in the design of new ones. I will exemplify my view by recollecting what is known on asymptotic analysis of metric indexes, and will add some new results.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128231818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Dynamic P2P Indexing and Search Based on Compact Clustering 基于紧凑聚类的动态P2P索引与搜索

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.32

Mauricio Marín, V. Gil-Costa, Cecilia Hernández

引用次数: 12

MUFIN: A Multi-feature Indexing Network MUFIN:一个多特征索引网络

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.24

Michal Batko, Vlastislav Dohnal, David Novak, J. Sedmidubský

引用次数: 14

CoPhIR Image Collection under the Microscope 显微镜下CoPhIR图像采集

2009 Second International Workshop on Similarity Search and Applications Pub Date : 2009-08-29 DOI: 10.1109/SISAP.2009.25

Michal Batko, Petra Budíková, David Novak

{"title":"CoPhIR Image Collection under the Microscope","authors":"Michal Batko, Petra Budíková, David Novak","doi":"10.1109/SISAP.2009.25","DOIUrl":"https://doi.org/10.1109/SISAP.2009.25","url":null,"abstract":"The Content-based Photo Image Retrieval (CoPhIR) dataset is the largest available database of digital images with corresponding visual descriptors. It contains five MPEG-7 global descriptors extracted from more than 106 million images from Flickr photo-sharing system. In this paper, we analyze this dataset focusing on 1) efficiency of similarity-based indexing and searching and on 2) expressiveness of combination of the descriptors with respect to subjective perception of visual similarity. We treat the descriptors as metric spaces and then combine them into a multi-metric space. We analyze distance distributions of individual descriptors, measure intrinsic dimensionality of these datasets and statistically evaluate correlation between these descriptors. Further, we use two methods to assess subjective accuracy and satisfaction of similarity retrieval based on a combination of descriptors that is recommended for CoPhIR, and we compare these results on databases of 10 and 100 million CoPhIR images. Finally, we suggest, explore and evaluate two approaches to improve the accuracy: 1) applying logarithms in order to weaken influence of a single descriptor contribution if it deviates from the rest, and 2) the possibility of categorization of the dataset and identifying visual characteristics important for individual categories.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128957123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31