{"title":"Combinatorial Framework for Similarity Search","authors":"Y. Lifshits","doi":"10.1109/SISAP.2009.31","DOIUrl":"https://doi.org/10.1109/SISAP.2009.31","url":null,"abstract":"We present an overview of combinatorial framework for similarity search. An algorithm is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Namely, the input dataset is represented by a comparison oracle that given any three points X,Y,Z answers whether Y or Z is closer to X. We assume that the similarity order of the dataset satisfies the four variations of the following disorder inequality: if X is the A'th most similar object to Y and Y is the B'th most similar object to Z, then X is among the D(A+B) most similar objects to Z, where D is a relatively small disorder constant. Combinatorial algorithms for nearest neighbor search have two important advantages: (1) they do not map similarity values to artificial distance values and do not use triangle inequality for the latter, and (2) they work for arbitrarily complicated data representations and similarity functions. Ranwalk, the first known combinatorial solution for nearest neighbors, is randomized, exact, zero-error algorithm with query time that is logarithmic in number of objects. But Ranwalk preprocessing time is quadratic. Later on, another solution, called combinatorial nets, was discovered. It is deterministic and exact algorithm with almost linear time and space complexity of preprocessing, and near-logarithmic time complexity of search. Combinatorial nets also have a number of side applications. For near-duplicate detection they lead to the first known deterministic algorithm that requires just near-linear time + time proportional to the size of output. For any dataset with small disorder combinatorial nets can be used to construct a visibility graph: the one in which greedy routing deterministically converges to the nearest neighbor of a target in logarithmic number of steps. The later result is the first known work-around for Navarro's impossibility of generalizing Delaunay graphs.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130885304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Tuneable Fuzzy Similarity in Non-metric Search","authors":"P. Vojtás, A. Eckhardt","doi":"10.1109/SISAP.2009.18","DOIUrl":"https://doi.org/10.1109/SISAP.2009.18","url":null,"abstract":"We propose an alternate method for indexing data for answering queries in non-metric spaces. The traditional use of distance and triangle inequality is substituted with the use of fuzzy similarity fulfilling the transitivity property with a tuneable fuzzy conjunctor. In a non-metric space it is still possible that there is a fuzzy conjunctor such that transitivity holds and usual indexing techniques based on pivots for range queries can be applied.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128350587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MiPai: Using the PP-Index to Build an Efficient and Scalable Similarity Search System","authors":"Andrea Esuli","doi":"10.1109/SISAP.2009.14","DOIUrl":"https://doi.org/10.1109/SISAP.2009.14","url":null,"abstract":"MiPai is an image search system that provides visual similarity search and text-based search functionalities. The similarity search functionality is implemented by means of the Permutation Prefix Index (PP-Index), a novel data structure for approximate similarity search. The text-based search functionality is based on a traditional inverted list index data structure. MiPai also provides a combined visual similarity/text search function.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123788810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Curse of Dimensionality in Pivot Based Indexes","authors":"I. Volnyansky, V. Pestov","doi":"10.1109/SISAP.2009.9","DOIUrl":"https://doi.org/10.1109/SISAP.2009.9","url":null,"abstract":"We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples picked in i.i.d. fashion from a sequence of metric spaces. We allow the size of the dataset to grow in relation to dimension, such that the dimension is superlogarithmic but subpolynomial in the size of the dataset. The number of pivots is sublinear in the size of the dataset. We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces in the sense of concentration of measure phenomenon is linear in dimension, then the performance of similarity search pivot-based indexes is asymptotically linear in the size of the dataset.","PeriodicalId":130242,"journal":{"name":"2009 Second International Workshop on Similarity Search and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129294797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}