在任意度量空间中有效的反向k近邻搜索

Proceedings of the 2006 ACM SIGMOD international conference on Management of data Pub Date : 2006-06-27 DOI:10.1145/1142473.1142531

Elke Achtert, C. Böhm, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz

{"title":"在任意度量空间中有效的反向k近邻搜索","authors":"Elke Achtert, C. Böhm, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz","doi":"10.1145/1142473.1142531","DOIUrl":null,"url":null,"abstract":"The reverse k-nearest neighbor (RkNN) problem, i.e. finding all objects in a data set the k-nearest neighbors of which include a specified query object, is a generalization of the reverse 1-nearest neighbor problem which has received increasing attention recently. Many industrial and scientific applications call for solutions of the RkNN problem in arbitrary metric spaces where the data objects are not Euclidean and only a metric distance function is given for specifying object similarity. Usually, these applications need a solution for the generalized problem where the value of k is not known in advance and may change from query to query. However, existing approaches, except one, are designed for the specific R1NN problem. In addition - to the best of our knowledge - all previously proposed methods, especially the one for generalized RkNN search, are only applicable to Euclidean vector data but not for general metric objects. In this paper, we propose the first approach for efficient RkNN search in arbitrary metric spaces where the value of k is specified at query time. Our approach uses the advantages of existing metric index structures but proposes to use conservative and progressive distance approximations in order to filter out true drops and true hits. In particular, we approximate the k-nearest neighbor distance for each data object by upper and lower bounds using two functions of only two parameters each. Thus, our method does not generate any considerable storage overhead. We show in a broad experimental evaluation on real-world data the scalability and the usability of our novel approach.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"147","resultStr":"{\"title\":\"Efficient reverse k-nearest neighbor search in arbitrary metric spaces\",\"authors\":\"Elke Achtert, C. Böhm, Peer Kröger, Peter Kunath, A. Pryakhin, M. Renz\",\"doi\":\"10.1145/1142473.1142531\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The reverse k-nearest neighbor (RkNN) problem, i.e. finding all objects in a data set the k-nearest neighbors of which include a specified query object, is a generalization of the reverse 1-nearest neighbor problem which has received increasing attention recently. Many industrial and scientific applications call for solutions of the RkNN problem in arbitrary metric spaces where the data objects are not Euclidean and only a metric distance function is given for specifying object similarity. Usually, these applications need a solution for the generalized problem where the value of k is not known in advance and may change from query to query. However, existing approaches, except one, are designed for the specific R1NN problem. In addition - to the best of our knowledge - all previously proposed methods, especially the one for generalized RkNN search, are only applicable to Euclidean vector data but not for general metric objects. In this paper, we propose the first approach for efficient RkNN search in arbitrary metric spaces where the value of k is specified at query time. Our approach uses the advantages of existing metric index structures but proposes to use conservative and progressive distance approximations in order to filter out true drops and true hits. In particular, we approximate the k-nearest neighbor distance for each data object by upper and lower bounds using two functions of only two parameters each. Thus, our method does not generate any considerable storage overhead. We show in a broad experimental evaluation on real-world data the scalability and the usability of our novel approach.\",\"PeriodicalId\":416090,\"journal\":{\"name\":\"Proceedings of the 2006 ACM SIGMOD international conference on Management of data\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"147\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2006 ACM SIGMOD international conference on Management of data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1142473.1142531\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1142473.1142531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 147

摘要

逆k近邻问题(RkNN)是近年来越来越受到关注的逆1近邻问题的推广，即在数据集中查找包含指定查询对象的k近邻的所有对象。许多工业和科学应用需要在任意度量空间中求解RkNN问题，其中数据对象不是欧几里德的，并且仅给出度量距离函数来指定对象相似性。通常，这些应用程序需要一个一般化问题的解决方案，其中k的值事先不知道，并且在每个查询之间可能会发生变化。然而，除了一种方法之外，现有的方法都是针对特定的R1NN问题设计的。此外，据我们所知，所有以前提出的方法，特别是广义RkNN搜索方法，只适用于欧几里得向量数据，而不适用于一般度量对象。在本文中，我们提出了在任意度量空间中有效的RkNN搜索的第一种方法，其中k的值在查询时指定。我们的方法利用了现有度量索引结构的优点，但建议使用保守和渐进距离近似来过滤掉真正的掉落和真正的命中。特别是，我们使用两个函数，每个函数只有两个参数，通过上界和下界近似每个数据对象的k近邻距离。因此，我们的方法不会产生任何可观的存储开销。我们在对真实世界数据的广泛实验评估中展示了我们的新方法的可扩展性和可用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient reverse k-nearest neighbor search in arbitrary metric spaces

The reverse k-nearest neighbor (RkNN) problem, i.e. finding all objects in a data set the k-nearest neighbors of which include a specified query object, is a generalization of the reverse 1-nearest neighbor problem which has received increasing attention recently. Many industrial and scientific applications call for solutions of the RkNN problem in arbitrary metric spaces where the data objects are not Euclidean and only a metric distance function is given for specifying object similarity. Usually, these applications need a solution for the generalized problem where the value of k is not known in advance and may change from query to query. However, existing approaches, except one, are designed for the specific R1NN problem. In addition - to the best of our knowledge - all previously proposed methods, especially the one for generalized RkNN search, are only applicable to Euclidean vector data but not for general metric objects. In this paper, we propose the first approach for efficient RkNN search in arbitrary metric spaces where the value of k is specified at query time. Our approach uses the advantages of existing metric index structures but proposes to use conservative and progressive distance approximations in order to filter out true drops and true hits. In particular, we approximate the k-nearest neighbor distance for each data object by upper and lower bounds using two functions of only two parameters each. Thus, our method does not generate any considerable storage overhead. We show in a broad experimental evaluation on real-world data the scalability and the usability of our novel approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2006 ACM SIGMOD international conference on Management of data

自引率

0.00%

发文量