The hybrid tree: an index structure for high dimensional feature spaces

K. Chakrabarti, S. Mehrotra
{"title":"The hybrid tree: an index structure for high dimensional feature spaces","authors":"K. Chakrabarti, S. Mehrotra","doi":"10.1109/ICDE.1999.754960","DOIUrl":null,"url":null,"abstract":"Feature-based similarity searching is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high-dimensional feature space which is indexed using a multidimensional data structure. Similarity searching then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 10-15 dimensional spaces. This paper introduces the hybrid tree-a multidimensional data structure for indexing high-dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (such as the R-tree, SS-tree or SR-tree) or a pure space partitioning (SP) one (such as the KDB-tree or hB-tree); rather it combines the positive aspects of the two types of index structures into a single data structure to achieve a search performance which is more scalable to high dimensionalities than either of the above techniques. Furthermore, unlike many data structures (e.g. distance-based index structures like the SS-tree and SR-tree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on \"real\" high-dimensional large-size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DP-based and SP-based index mechanisms as well as linear scans at all dimensionalities for large-sized databases.","PeriodicalId":236128,"journal":{"name":"Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"258","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1999.754960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 258

Abstract

Feature-based similarity searching is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high-dimensional feature space which is indexed using a multidimensional data structure. Similarity searching then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 10-15 dimensional spaces. This paper introduces the hybrid tree-a multidimensional data structure for indexing high-dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (such as the R-tree, SS-tree or SR-tree) or a pure space partitioning (SP) one (such as the KDB-tree or hB-tree); rather it combines the positive aspects of the two types of index structures into a single data structure to achieve a search performance which is more scalable to high dimensionalities than either of the above techniques. Furthermore, unlike many data structures (e.g. distance-based index structures like the SS-tree and SR-tree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on "real" high-dimensional large-size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DP-based and SP-based index mechanisms as well as linear scans at all dimensionalities for large-sized databases.
混合树:高维特征空间的索引结构
基于特征的相似度搜索正在成为数据库系统中一种重要的搜索范式。所使用的技术是将数据项作为点映射到高维特征空间,该特征空间使用多维数据结构进行索引。然后,相似性搜索对应于对数据结构的范围搜索。虽然已经提出了几种用于特征索引的数据结构,但已知它们中没有一个可以扩展到10-15维空间。本文介绍了一种用于高维特征空间索引的多维数据结构——混合树。与其他多维数据结构不同,混合树既不能归类为纯数据分区(DP)索引结构(如r -树、ss -树或sr -树),也不能归类为纯空间分区(SP)索引结构(如kdb -树或hb -树);相反,它将两种索引结构的积极方面结合到一个数据结构中,以实现比上述任何一种技术更适合高维的搜索性能。此外,与许多数据结构(如基于距离的索引结构,如SS-tree和SR-tree)不同,混合树可以支持基于任意距离函数的查询。我们在“真实的”高维大型特征数据库上的实验表明,混合树可以很好地扩展到高维和大型数据库。它明显优于纯基于dp和基于sp的索引机制,以及大型数据库所有维度的线性扫描。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信