关于k近邻问题的I/O复杂度

Mayank Goswami, R. Jacob, R. Pagh
{"title":"关于k近邻问题的I/O复杂度","authors":"Mayank Goswami, R. Jacob, R. Pagh","doi":"10.1145/3375395.3387649","DOIUrl":null,"url":null,"abstract":"We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: í(k) block reads are needed to to answer a query. For the l∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using optimal ⌈k/B⌉ I/Os, where B is the block size. For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns k c-approximate nearest neighbors in ⌈k/B⌉ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d ≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the I/O Complexity of the k-Nearest Neighbors Problem\",\"authors\":\"Mayank Goswami, R. Jacob, R. Pagh\",\"doi\":\"10.1145/3375395.3387649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: í(k) block reads are needed to to answer a query. For the l∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using optimal ⌈k/B⌉ I/Os, where B is the block size. For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns k c-approximate nearest neighbors in ⌈k/B⌉ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d ≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.\",\"PeriodicalId\":412441,\"journal\":{\"name\":\"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3375395.3387649\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3375395.3387649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们考虑了k-最近邻(k- nn)问题的精确和近似版本的静态,外部内存索引,并在标准不可分割假设下显示了新的下界:在Hamming空间中高维k- nn的多项式空间索引方案不能利用块传输:í(k)块读取需要回答查询。对于l∞度量,即使我们允许返回c-近似近邻,下界也保持不变,对于c∈(1,3)。对c < 3的限制是必要的:对于每个度量,在Hellerstein等人的可索引性模型中存在一个索引方案,使用空间O(kn),其中n是点的数量,可以使用最优的(k /B) I/O检索k - 3-近似近邻,其中B是块大小。对于特定的度量,可以使用具有更好近似因子的数据结构。对于Hamming空间中的k- nn,且每一个近似因子c>1,存在一个多项式空间数据结构,该结构返回k个c-近似的最近邻在≤k/B≤I/ o。为了显示这些下界,我们开发了两种新技术:首先,为了处理近似算法在决定返回哪个结果集方面有更大的自由,我们开发了Hellerstein等人的λ集工作量技术的放宽版本。这种技术使我们能够证明在d≥n维情况下的下界。为了将下界扩展到d = O(k log(n/k))维数,我们开发了一种新的确定性降维技术,这可能是一个独立的兴趣。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the I/O Complexity of the k-Nearest Neighbors Problem
We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: í(k) block reads are needed to to answer a query. For the l∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using optimal ⌈k/B⌉ I/Os, where B is the block size. For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns k c-approximate nearest neighbors in ⌈k/B⌉ I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d ≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信