Parallel Cover Trees and their Applications

Yan Gu, Zachary Napier, Yihan Sun, Letong Wang
{"title":"Parallel Cover Trees and their Applications","authors":"Yan Gu, Zachary Napier, Yihan Sun, Letong Wang","doi":"10.1145/3490148.3538581","DOIUrl":null,"url":null,"abstract":"The cover tree is the canonical data structure that efficiently maintains a dynamic set of points on a metric space and supports nearest and k-nearest neighbor searches. For most real-world datasets with reasonable distributions (constant expansion rate and bounded aspect ratio mathematically), single-point insertion, single-point deletion, and nearest neighbor search (NNS) only cost logarithmically to the size of the point set. Unfortunately, due to the complication and the use of depth-first traversal order in the cover tree algorithms, we were unaware of any parallel approaches for these cover tree algorithms. This paper shows highly parallel and work-efficient cover tree algorithms that can handle batch insertions (and thus construction) and batch deletions. Assuming constant expansion rate and bounded aspect ratio, inserting or deleting m points into a cover tree with n points takes O(m log n) expected work and polylogarithmic span with high probability. Our algorithms rely on some novel algorithmic insights. We model the insertion and deletion process as a graph and use a maximal independent set (MIS) to generate tree nodes without conflicts. We use three key ideas to guarantee work-efficiency: the prefix-doubling scheme, a careful design to limit the graph size on which we apply MIS, and a strategy to propagate information among different levels in the cover tree. We also use path-copying to make our parallel cover tree a persistent data structure, which is useful in several applications. Using our parallel cover trees, we show work-efficient (or near-work-efficient) and highly parallel solutions for a list of problems in computational geometry and machine learning, including Euclidean minimum spanning tree (EMST), single-linkage clustering, bichromatic closest pair (BCP), density-based clustering and its hierarchical version, and others. To the best of our knowledge, many of them are the first solutions to achieve work-efficiency and polylogarithmic span assuming constant expansion rate and bounded aspect ratio.","PeriodicalId":112865,"journal":{"name":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"377 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3490148.3538581","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

The cover tree is the canonical data structure that efficiently maintains a dynamic set of points on a metric space and supports nearest and k-nearest neighbor searches. For most real-world datasets with reasonable distributions (constant expansion rate and bounded aspect ratio mathematically), single-point insertion, single-point deletion, and nearest neighbor search (NNS) only cost logarithmically to the size of the point set. Unfortunately, due to the complication and the use of depth-first traversal order in the cover tree algorithms, we were unaware of any parallel approaches for these cover tree algorithms. This paper shows highly parallel and work-efficient cover tree algorithms that can handle batch insertions (and thus construction) and batch deletions. Assuming constant expansion rate and bounded aspect ratio, inserting or deleting m points into a cover tree with n points takes O(m log n) expected work and polylogarithmic span with high probability. Our algorithms rely on some novel algorithmic insights. We model the insertion and deletion process as a graph and use a maximal independent set (MIS) to generate tree nodes without conflicts. We use three key ideas to guarantee work-efficiency: the prefix-doubling scheme, a careful design to limit the graph size on which we apply MIS, and a strategy to propagate information among different levels in the cover tree. We also use path-copying to make our parallel cover tree a persistent data structure, which is useful in several applications. Using our parallel cover trees, we show work-efficient (or near-work-efficient) and highly parallel solutions for a list of problems in computational geometry and machine learning, including Euclidean minimum spanning tree (EMST), single-linkage clustering, bichromatic closest pair (BCP), density-based clustering and its hierarchical version, and others. To the best of our knowledge, many of them are the first solutions to achieve work-efficiency and polylogarithmic span assuming constant expansion rate and bounded aspect ratio.
平行覆盖树及其应用
覆盖树是一种规范的数据结构,它有效地维护度量空间上的动态点集,并支持最近邻和k近邻搜索。对于大多数具有合理分布的真实世界数据集(数学上恒定的扩展率和有界的宽高比),单点插入、单点删除和最近邻搜索(NNS)的成本仅为点集大小的对数。不幸的是,由于覆盖树算法的复杂性和深度优先遍历顺序的使用,我们不知道这些覆盖树算法有任何并行方法。本文展示了高度并行和工作效率高的覆盖树算法,可以处理批量插入(从而构建)和批量删除。假设扩展速率恒定,宽高比有界,在有n个点的覆盖树中插入或删除m个点需要O(m log n)的期望功和高概率的多对数张成。我们的算法依赖于一些新颖的算法见解。我们将插入和删除过程建模为一个图,并使用最大独立集(MIS)来生成无冲突的树节点。我们使用三个关键思想来保证工作效率:前缀加倍方案,仔细设计以限制我们应用MIS的图的大小,以及在覆盖树的不同层次之间传播信息的策略。我们还使用路径复制使并行覆盖树成为持久的数据结构,这在几个应用程序中都很有用。使用我们的并行覆盖树,我们展示了计算几何和机器学习中一系列问题的高效(或接近高效)和高度并行的解决方案,包括欧几里得最小生成树(EMST)、单链接聚类、双色最接近对(BCP)、基于密度的聚类及其分层版本等。据我们所知,它们中的许多都是第一个实现工作效率和多对数跨度的解决方案,假设恒定的扩展率和有限的宽高比。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信