{"title":"Parallel Dynamic Batch Loading in the M-tree","authors":"Jakub Lokoč","doi":"10.1109/SISAP.2009.27","DOIUrl":"https://doi.org/10.1109/SISAP.2009.27","url":null,"abstract":"Although metric access methods (MAMs) proved their capabilities when performing efficient similarity search, their further performance improvement is needed due to extreme growth of data volumes. Since multi core processors become widely available, it is justified to exploit parallelism. However, taking into account the Gustafson’s law, it is necessary to find tasks suitable for parallelization. Such a task could be M-tree construction. Unfortunately, parallelism during an object insertion in hierarchical index structures is limited by a node capacity. It is much less restrictive to run several independent insertions in parallel. However, synchronization problems occur whenever a node is about to split. In this paper we present our new technique of M-tree construction. The technique postpones splitting of overfull nodes and thus allows simple parallelization of M-tree construction. We also utilize an adaptation of recently introduced re-inserting technique in the M-tree. Our experiments confirm the new technique guarantees significant speed up of M-tree construction and also improves the quality of the index.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126853866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query Routing Mechanisms in Self-Organizing Search Systems","authors":"Vlastislav Dohnal, J. Sedmidubský","doi":"10.1109/SISAP.2009.13","DOIUrl":"https://doi.org/10.1109/SISAP.2009.13","url":null,"abstract":"We analyze routing mechanisms of a self-organizing semantic overlay for content-based search in multimedia data. This overlay operates over any existing P2P network based on the metric space approach. In particular, we replace the previous design of routing mechanisms in Metric Semantic Overlay (MSO) with a new adaptive query-routing algorithm. An advantage of it lies in an automatic tuning of confusability of queries that is used to select peers during query evaluation. These improvements are experimentally evaluated on a real-life and synthetic dataset.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114867592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Img(Rummager): An Interactive Content Based Image Retrieval System","authors":"S. Chatzichristofis, Y. Boutalis, M. Lux","doi":"10.1109/SISAP.2009.16","DOIUrl":"https://doi.org/10.1109/SISAP.2009.16","url":null,"abstract":"This paper presents an image retrieval suite called img(Rummager) which brings into effect a number of new as well as state of the art descriptors. The application can execute an image search based on a query image, either from XML-based index ¿les, or directly from a folder containing image ¿les, extracting the comparison features in real time. In addition the img(Rummager) application can execute a hybrid search of images from the application server, combining keyword information and visual similarity. Also img(Rummager) supports easy retrieval evaluation based on the normalized modi¿ed retrieval rank (NMRR) and average precision (AP).","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116699129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Approximate Direct and Reverse Nearest Neighbor Queries, and the k-nearest Neighbor Graph","authors":"Karina Figueroa, Rodrigo Paredes","doi":"10.1109/SISAP.2009.33","DOIUrl":"https://doi.org/10.1109/SISAP.2009.33","url":null,"abstract":"Retrieving the emph{k-nearest neighbors} of a query object is a basic primitive in similarity searching. A related, far less explored primitive is to obtain the dataset elements which would have the query object within their own emph{k}-nearest neighbors, known as the emph{reverse k-nearest neighbor} query. We already have indices and algorithms to solve emph{k}-nearest neighbors queries in general metric spaces; yet, in many cases of practical interest they degenerate to sequential scanning. The naive algorithm for reverse emph{k}-nearest neighbor queries has quadratic complexity, because the emph{k}-nearest neighbors of all the dataset objects must be found; this is too expensive. Hence, when solving these primitives we can tolerate trading correctness in the solution for searching time. In this paper we propose an efficient approximate approach to solve these similarity queries with high retrieval rate. Then, we show how to use our approximate emph{k}-nearest neighbor queries to construct (an approximation of) the emph{k-nearest neighbor graph} when we have a fixed dataset. Finally, combining both primitives we show how to emph{dynamically maintain} the approximate emph{k}-nearest neighbor graph of the objects currently stored within the metric dataset, that is, considering both object insertions and deletions.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128136415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Spatial Approximation Trees for Massive Data","authors":"G. Navarro, Nora Reyes","doi":"10.1109/SISAP.2009.28","DOIUrl":"https://doi.org/10.1109/SISAP.2009.28","url":null,"abstract":"Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex objects. Although promising, the metric space approach is still immature in several aspects that are well established in traditional databases. In particular, most indexing schemes are not dynamic, that is, few of them tolerate insertion of elements at reasonable cost over an existing index and only a few work efficiently in secondary memory. In this paper we introduce a secondary-memory variant of the Dynamic Spatial Approximation Tree, which has shown to be competitive in main memory. The resulting index handles well the secondary memory scenario and is competitive with the state of the art, becoming a useful alternative in a wide range of database applications. Moreover, our ideas are applicable to other secondary-memory trees where there is little control over the tree shape.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127838072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speeding Up Permutation Based Indexing with Indexing","authors":"Karina Figueroa, K. Fredriksson","doi":"10.1109/SISAP.2009.12","DOIUrl":"https://doi.org/10.1109/SISAP.2009.12","url":null,"abstract":"A recent probabilistic approach for searching in high dimensional metric spaces is based on predicting the distances between database elements according to how they order their distances towards some set of distinguished elements, called permutants. In the preprocessing phase a set of permutants is chosen, and are sorted (permuted) by their distances against every database element. The permutations form the index. When a query is given, its corresponding permutation is computed, and --- as similar elements will (probably) have a similar permutation --- the database is compared in the order induced by the similarity between permutations. This works well but has relatively high CPU time due to computing the distances between permutations and (partially) sorting the database by the similarity. We improve this by identifying and solving this as another metric space problem. This avoids many distance computations between the permutants. The experimental results show that this works extremely well in practice.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134462426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Effectiveness of Distances Measuring Protein Structure Similarity","authors":"Jakub Galgonek, D. Hoksza","doi":"10.1109/SISAP.2009.19","DOIUrl":"https://doi.org/10.1109/SISAP.2009.19","url":null,"abstract":"Determining similarity between two protein structures is one of the most fundamental problems in contemporary structural bioinformatics. With the increasing complexity of the measures, their effectiveness increases as well. However, other important observables, such as the degree of metric properties fulfilment, could rather deteriorate than improve. In this paper we introduce an effective measure and study its degree of metric properties fulfilment.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"163 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127364249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural Entropic Difference: A Bounded Distance Metric for Unordered Trees","authors":"R. Connor, Fabio Simeoni, Michael Iakovos","doi":"10.1109/SISAP.2009.29","DOIUrl":"https://doi.org/10.1109/SISAP.2009.29","url":null,"abstract":"We show a new metric for comparing unordered, tree-structured data. While such data is increasingly important in its own right, the methodology underlying the construction of the metric is generic and may be reused for other classes of ordered and partially ordered data. The metric is based on the information content of the two values under consideration, which is measured using Shannon's entropy equations. In essence, the more commonality the values possess, the closer they are. As values in this domain may have no commonality, a good metric should be bounded to represent this. This property has been achieved, but is in tension with triangle inequality.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133082320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Searching by Similarity and Classifying Images on a Very Large Scale","authors":"Giuseppe Amato, P. Savino","doi":"10.1109/SISAP.2009.10","DOIUrl":"https://doi.org/10.1109/SISAP.2009.10","url":null,"abstract":"In the demonstration we will show a system for searching by similarity and automatically classifying images in a very large dataset. The demonstrated techniques are based on the use of the MI-File (Metric Inverted File) as the access method for executing similarity search efficiently. The MI-File is an access methods based on inverted files that relies on a space transformation that use the notion of perspective to decide about the similarity between two objects. More specifically, if two objects are close one to each other, also the view of the space from their position is similar. Leveraging on this space transformation, it is possible to use inverted file to execute approximate similarity search. In order to test the scalability of this access method, we inserted 106 millions images from the CoPhIR dataset and we created an on-line search engine that allows everybody to search in this dataset. In addition we also used this access methods to perform automatic classification on this very large image dataset. More specifically, we reformulated the classification problem, as resulting from the use of SVM with RBF kernel, as a complex approximate similarity search problem. In such a way, instead of comparing every single image against the classifier, the best images belonging to a class are directly obtained as the result of a complex approximate similarity search query.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134388430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantinos Zagoris, Savvas A. Chatzichristofis, Nikos Papamarkos, Yiannis S. Boutalis
{"title":"img(Anaktisi): A Web Content Based Image Retrieval System","authors":"Konstantinos Zagoris, Savvas A. Chatzichristofis, Nikos Papamarkos, Yiannis S. Boutalis","doi":"10.1109/SISAP.2009.15","DOIUrl":"https://doi.org/10.1109/SISAP.2009.15","url":null,"abstract":"img(Anaktisi) is a C#/.NET content base image retrieval application suitable for the web. It provides ef¿cient retrieval services for various image databases using as a query a sample image, an image sketched by the user and keywords. The image retrieval engine is powered by innovative compact and effective descriptors. Also, an Auto Relevance Feedback (ARF) technique is provided to the user. This technique readjusts the initial retrieval results based on user preferences improving the retrieval score signi¿cantly. img(Anaktisi) can be found at http://www.anaktisi.net","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134390147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}