{"title":"Revisiting kd-tree for Nearest Neighbor Search","authors":"P. Ram, Kaushik Sinha","doi":"10.1145/3292500.3330875","DOIUrl":null,"url":null,"abstract":"\\kdtree \\citefriedman1976algorithm has long been deemed unsuitable for exact nearest-neighbor search in high dimensional data. The theoretical guarantees and the empirical performance of \\kdtree do not show significant improvements over brute-force nearest-neighbor search in moderate to high dimensions. \\kdtree has been used relatively more successfully for approximate search \\citemuja2009flann but lack theoretical guarantees. In the article, we build upon randomized-partition trees \\citedasgupta2013randomized to propose \\kdtree based approximate search schemes with $O(d łog d + łog n)$ query time for data sets with n points in d dimensions and rigorous theoretical guarantees on the search accuracy. We empirically validate the search accuracy and the query time guarantees of our proposed schemes, demonstrating the significantly improved scaling for same level of accuracy.","PeriodicalId":186134,"journal":{"name":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"59","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3292500.3330875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 59
Abstract
\kdtree \citefriedman1976algorithm has long been deemed unsuitable for exact nearest-neighbor search in high dimensional data. The theoretical guarantees and the empirical performance of \kdtree do not show significant improvements over brute-force nearest-neighbor search in moderate to high dimensions. \kdtree has been used relatively more successfully for approximate search \citemuja2009flann but lack theoretical guarantees. In the article, we build upon randomized-partition trees \citedasgupta2013randomized to propose \kdtree based approximate search schemes with $O(d łog d + łog n)$ query time for data sets with n points in d dimensions and rigorous theoretical guarantees on the search accuracy. We empirically validate the search accuracy and the query time guarantees of our proposed schemes, demonstrating the significantly improved scaling for same level of accuracy.