{"title":"A non-linear dimensionality-reduction technique for fast similarity search in large databases","authors":"Khanh Vu, K. Hua, Hao Cheng, S. Lang","doi":"10.1145/1142473.1142532","DOIUrl":null,"url":null,"abstract":"To enable efficient similarity search in large databases, many indexing techniques use a linear transformation scheme to reduce dimensions and allow fast approximation. In this reduction approach the approximation is unbounded, so that the approximation volume extends across the dataspace. This causes over-estimation of retrieval sets and impairs performance.This paper presents a non-linear transformation scheme that extracts two important parameters specifying the data. We prove that these parameters correspond to a bounded volume around the search sphere, irrespective of dimensionality. We use a special workspace-mapping mechanism to derive tight bounds for the parameters and to prove further results, as well as highlighting insights into the problems and our proposed solutions. We formulate a measure that lower-bounds the Euclidean distance, and discuss the implementation of the technique upon a popular index structure. Extensive experiments confirm the superiority of this technique over recent state-of-the-art schemes.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1142473.1142532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36
Abstract
To enable efficient similarity search in large databases, many indexing techniques use a linear transformation scheme to reduce dimensions and allow fast approximation. In this reduction approach the approximation is unbounded, so that the approximation volume extends across the dataspace. This causes over-estimation of retrieval sets and impairs performance.This paper presents a non-linear transformation scheme that extracts two important parameters specifying the data. We prove that these parameters correspond to a bounded volume around the search sphere, irrespective of dimensionality. We use a special workspace-mapping mechanism to derive tight bounds for the parameters and to prove further results, as well as highlighting insights into the problems and our proposed solutions. We formulate a measure that lower-bounds the Euclidean distance, and discuss the implementation of the technique upon a popular index structure. Extensive experiments confirm the superiority of this technique over recent state-of-the-art schemes.