{"title":"紧凑距离直方图:一种提高k近邻查询的新结构","authors":"M. Bedo, D. S. Kaster, A. Traina, C. Traina","doi":"10.1145/2791347.2791359","DOIUrl":null,"url":null,"abstract":"The k-Nearest Neighbor query (k-NNq) is one of the most useful similarity queries. Elaborated k-NNq algorithms depend on an initial radius to prune regions of the search space that cannot contribute to the answer. Therefore, estimating a suitable starting radius is of major importance to accelerate k-NNq execution. This paper presents a new technique to estimate a tight initial radius. Our approach, named CDH-kNN, relies on Compact Distance Histograms (CDHs), which are pivot-based histograms defined as piecewise linear functions. Such structures approximate the distance distribution and are compressed according to a given constraint, which can be a desired number of buckets and/or a maximum allowed error. The covering radius of a k-NNq is estimated based on the relationship between the query element and the CDHs' joint frequencies. The paper presents a complete specification of CDH-kNN, including CDH's construction and radii estimation. Extensive experiments on both real and synthetic datasets highlighted the efficiency of our approach, showing that it was up to 72% faster than existing algorithms, outperforming every competitor in all the setups evaluated. In fact, the experiments showed that our proposal was just 20% slower than the theoretical lower bound.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Compact distance histogram: a novel structure to boost k-nearest neighbor queries\",\"authors\":\"M. Bedo, D. S. Kaster, A. Traina, C. Traina\",\"doi\":\"10.1145/2791347.2791359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The k-Nearest Neighbor query (k-NNq) is one of the most useful similarity queries. Elaborated k-NNq algorithms depend on an initial radius to prune regions of the search space that cannot contribute to the answer. Therefore, estimating a suitable starting radius is of major importance to accelerate k-NNq execution. This paper presents a new technique to estimate a tight initial radius. Our approach, named CDH-kNN, relies on Compact Distance Histograms (CDHs), which are pivot-based histograms defined as piecewise linear functions. Such structures approximate the distance distribution and are compressed according to a given constraint, which can be a desired number of buckets and/or a maximum allowed error. The covering radius of a k-NNq is estimated based on the relationship between the query element and the CDHs' joint frequencies. The paper presents a complete specification of CDH-kNN, including CDH's construction and radii estimation. Extensive experiments on both real and synthetic datasets highlighted the efficiency of our approach, showing that it was up to 72% faster than existing algorithms, outperforming every competitor in all the setups evaluated. In fact, the experiments showed that our proposal was just 20% slower than the theoretical lower bound.\",\"PeriodicalId\":225179,\"journal\":{\"name\":\"Proceedings of the 27th International Conference on Scientific and Statistical Database Management\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 27th International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2791347.2791359\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2791347.2791359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Compact distance histogram: a novel structure to boost k-nearest neighbor queries
The k-Nearest Neighbor query (k-NNq) is one of the most useful similarity queries. Elaborated k-NNq algorithms depend on an initial radius to prune regions of the search space that cannot contribute to the answer. Therefore, estimating a suitable starting radius is of major importance to accelerate k-NNq execution. This paper presents a new technique to estimate a tight initial radius. Our approach, named CDH-kNN, relies on Compact Distance Histograms (CDHs), which are pivot-based histograms defined as piecewise linear functions. Such structures approximate the distance distribution and are compressed according to a given constraint, which can be a desired number of buckets and/or a maximum allowed error. The covering radius of a k-NNq is estimated based on the relationship between the query element and the CDHs' joint frequencies. The paper presents a complete specification of CDH-kNN, including CDH's construction and radii estimation. Extensive experiments on both real and synthetic datasets highlighted the efficiency of our approach, showing that it was up to 72% faster than existing algorithms, outperforming every competitor in all the setups evaluated. In fact, the experiments showed that our proposal was just 20% slower than the theoretical lower bound.