{"title":"混合树:高维特征空间的索引结构","authors":"K. Chakrabarti, S. Mehrotra","doi":"10.1109/ICDE.1999.754960","DOIUrl":null,"url":null,"abstract":"Feature-based similarity searching is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high-dimensional feature space which is indexed using a multidimensional data structure. Similarity searching then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 10-15 dimensional spaces. This paper introduces the hybrid tree-a multidimensional data structure for indexing high-dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (such as the R-tree, SS-tree or SR-tree) or a pure space partitioning (SP) one (such as the KDB-tree or hB-tree); rather it combines the positive aspects of the two types of index structures into a single data structure to achieve a search performance which is more scalable to high dimensionalities than either of the above techniques. Furthermore, unlike many data structures (e.g. distance-based index structures like the SS-tree and SR-tree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on \"real\" high-dimensional large-size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DP-based and SP-based index mechanisms as well as linear scans at all dimensionalities for large-sized databases.","PeriodicalId":236128,"journal":{"name":"Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"258","resultStr":"{\"title\":\"The hybrid tree: an index structure for high dimensional feature spaces\",\"authors\":\"K. Chakrabarti, S. Mehrotra\",\"doi\":\"10.1109/ICDE.1999.754960\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature-based similarity searching is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high-dimensional feature space which is indexed using a multidimensional data structure. Similarity searching then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 10-15 dimensional spaces. This paper introduces the hybrid tree-a multidimensional data structure for indexing high-dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (such as the R-tree, SS-tree or SR-tree) or a pure space partitioning (SP) one (such as the KDB-tree or hB-tree); rather it combines the positive aspects of the two types of index structures into a single data structure to achieve a search performance which is more scalable to high dimensionalities than either of the above techniques. Furthermore, unlike many data structures (e.g. distance-based index structures like the SS-tree and SR-tree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on \\\"real\\\" high-dimensional large-size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DP-based and SP-based index mechanisms as well as linear scans at all dimensionalities for large-sized databases.\",\"PeriodicalId\":236128,\"journal\":{\"name\":\"Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-03-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"258\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.1999.754960\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.1999.754960","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The hybrid tree: an index structure for high dimensional feature spaces
Feature-based similarity searching is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high-dimensional feature space which is indexed using a multidimensional data structure. Similarity searching then corresponds to a range search over the data structure. Although several data structures have been proposed for feature indexing, none of them is known to scale beyond 10-15 dimensional spaces. This paper introduces the hybrid tree-a multidimensional data structure for indexing high-dimensional feature spaces. Unlike other multidimensional data structures, the hybrid tree cannot be classified as either a pure data partitioning (DP) index structure (such as the R-tree, SS-tree or SR-tree) or a pure space partitioning (SP) one (such as the KDB-tree or hB-tree); rather it combines the positive aspects of the two types of index structures into a single data structure to achieve a search performance which is more scalable to high dimensionalities than either of the above techniques. Furthermore, unlike many data structures (e.g. distance-based index structures like the SS-tree and SR-tree), the hybrid tree can support queries based on arbitrary distance functions. Our experiments on "real" high-dimensional large-size feature databases demonstrate that the hybrid tree scales well to high dimensionality and large database sizes. It significantly outperforms both purely DP-based and SP-based index mechanisms as well as linear scans at all dimensionalities for large-sized databases.