A non-linear dimensionality-reduction technique for fast similarity search in large databases

Khanh Vu, K. Hua, Hao Cheng, S. Lang
{"title":"A non-linear dimensionality-reduction technique for fast similarity search in large databases","authors":"Khanh Vu, K. Hua, Hao Cheng, S. Lang","doi":"10.1145/1142473.1142532","DOIUrl":null,"url":null,"abstract":"To enable efficient similarity search in large databases, many indexing techniques use a linear transformation scheme to reduce dimensions and allow fast approximation. In this reduction approach the approximation is unbounded, so that the approximation volume extends across the dataspace. This causes over-estimation of retrieval sets and impairs performance.This paper presents a non-linear transformation scheme that extracts two important parameters specifying the data. We prove that these parameters correspond to a bounded volume around the search sphere, irrespective of dimensionality. We use a special workspace-mapping mechanism to derive tight bounds for the parameters and to prove further results, as well as highlighting insights into the problems and our proposed solutions. We formulate a measure that lower-bounds the Euclidean distance, and discuss the implementation of the technique upon a popular index structure. Extensive experiments confirm the superiority of this technique over recent state-of-the-art schemes.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1142473.1142532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36

Abstract

To enable efficient similarity search in large databases, many indexing techniques use a linear transformation scheme to reduce dimensions and allow fast approximation. In this reduction approach the approximation is unbounded, so that the approximation volume extends across the dataspace. This causes over-estimation of retrieval sets and impairs performance.This paper presents a non-linear transformation scheme that extracts two important parameters specifying the data. We prove that these parameters correspond to a bounded volume around the search sphere, irrespective of dimensionality. We use a special workspace-mapping mechanism to derive tight bounds for the parameters and to prove further results, as well as highlighting insights into the problems and our proposed solutions. We formulate a measure that lower-bounds the Euclidean distance, and discuss the implementation of the technique upon a popular index structure. Extensive experiments confirm the superiority of this technique over recent state-of-the-art schemes.
大型数据库中快速相似度搜索的非线性降维技术
为了在大型数据库中实现高效的相似性搜索,许多索引技术使用线性转换方案来降低维数并允许快速逼近。在这种约简方法中,近似值是无界的,因此近似值在整个数据空间中扩展。这会导致对检索集的过度估计,并损害性能。本文提出了一种非线性转换方案,该方案提取了指定数据的两个重要参数。我们证明了这些参数对应于搜索球周围的有界体积,与维数无关。我们使用一种特殊的工作空间映射机制来推导参数的严格界限,并证明进一步的结果,以及突出对问题和我们提出的解决方案的见解。我们提出了一个度量欧几里得距离的下限,并讨论了该技术在一个流行的索引结构上的实现。大量的实验证实了这种技术优于最近最先进的方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信