高维空间中最近邻搜索的新方法

2008 4th International IEEE Conference Intelligent Systems Pub Date : 2008-11-11 DOI:10.1109/IS.2008.4670504

Ming Zhang, R. Alhajj

{"title":"高维空间中最近邻搜索的新方法","authors":"Ming Zhang, R. Alhajj","doi":"10.1109/IS.2008.4670504","DOIUrl":null,"url":null,"abstract":"Index structures for nearest neighbor search in high-dimensional metric space are mostly built by partitioning the data set based on distances to certain reference point(s). Using the constructed index, the search is limited to a smaller number of the partitions in a way to avoid exhaustive search. However, the approaches already described in the literature either ignore the property of the data distribution or produce non-disjoint partitions; this greatly aspects the search efficiency. In this paper, we propose a new index structure, which overcomes the above disadvantages. The proposed tree structure is constructed by recursively dividing the data set into a nested set of approximate equivalence classes. We also propose a new reference point selection method using principal component analysis (PCA). The conducted analysis and the reported test results demonstrate that the proposed index structure, empowered by the PCA-based reference selection strategy, gives an optimal partition of the data set and greatly improves the search efficiency compared to the VP-tree, which is one of the approaches well documented in the literature.","PeriodicalId":305750,"journal":{"name":"2008 4th International IEEE Conference Intelligent Systems","volume":"00 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Novel approach for nearest neighbor search in high dimensional space\",\"authors\":\"Ming Zhang, R. Alhajj\",\"doi\":\"10.1109/IS.2008.4670504\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Index structures for nearest neighbor search in high-dimensional metric space are mostly built by partitioning the data set based on distances to certain reference point(s). Using the constructed index, the search is limited to a smaller number of the partitions in a way to avoid exhaustive search. However, the approaches already described in the literature either ignore the property of the data distribution or produce non-disjoint partitions; this greatly aspects the search efficiency. In this paper, we propose a new index structure, which overcomes the above disadvantages. The proposed tree structure is constructed by recursively dividing the data set into a nested set of approximate equivalence classes. We also propose a new reference point selection method using principal component analysis (PCA). The conducted analysis and the reported test results demonstrate that the proposed index structure, empowered by the PCA-based reference selection strategy, gives an optimal partition of the data set and greatly improves the search efficiency compared to the VP-tree, which is one of the approaches well documented in the literature.\",\"PeriodicalId\":305750,\"journal\":{\"name\":\"2008 4th International IEEE Conference Intelligent Systems\",\"volume\":\"00 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 4th International IEEE Conference Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IS.2008.4670504\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 4th International IEEE Conference Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IS.2008.4670504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在高维度量空间中，最近邻搜索的索引结构通常是基于到某个参考点的距离对数据集进行划分来构建的。使用构造的索引，搜索被限制在较小数量的分区中，以避免穷举搜索。然而，文献中已经描述的方法要么忽略数据分布的性质，要么产生非不相交的分区;这大大提高了搜索效率。本文提出了一种新的索引结构，克服了上述缺点。所提出的树结构是通过递归地将数据集划分为嵌套的近似等价类集来构建的。我们还提出了一种新的参考点选择方法——主成分分析(PCA)。所进行的分析和报告的测试结果表明，所提出的索引结构在基于pca的参考选择策略的支持下，给出了数据集的最优划分，并且与文献中已有的方法之一的VP-tree相比，大大提高了搜索效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Novel approach for nearest neighbor search in high dimensional space

Index structures for nearest neighbor search in high-dimensional metric space are mostly built by partitioning the data set based on distances to certain reference point(s). Using the constructed index, the search is limited to a smaller number of the partitions in a way to avoid exhaustive search. However, the approaches already described in the literature either ignore the property of the data distribution or produce non-disjoint partitions; this greatly aspects the search efficiency. In this paper, we propose a new index structure, which overcomes the above disadvantages. The proposed tree structure is constructed by recursively dividing the data set into a nested set of approximate equivalence classes. We also propose a new reference point selection method using principal component analysis (PCA). The conducted analysis and the reported test results demonstrate that the proposed index structure, empowered by the PCA-based reference selection strategy, gives an optimal partition of the data set and greatly improves the search efficiency compared to the VP-tree, which is one of the approaches well documented in the literature.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 4th International IEEE Conference Intelligent Systems

自引率

0.00%

发文量