通过精确的分层方法找到HSP邻居

IF 3 2区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Cole Foster , Edgar Chávez , Benjamin Kimia
{"title":"通过精确的分层方法找到HSP邻居","authors":"Cole Foster ,&nbsp;Edgar Chávez ,&nbsp;Benjamin Kimia","doi":"10.1016/j.is.2025.102565","DOIUrl":null,"url":null,"abstract":"<div><div>The Half Space Proximal (HSP) graph is a low out-degree monotonic graph with a wide range of applications in various domains, including combinatorial optimization in strings, enhancing <span><math><mi>k</mi></math></span>NN classification, simplifying chemical networks, estimating local intrinsic dimensionality, and generating uniform samples from skewed distributions, among others. However, the linear complexity of finding HSP neighbors of a query limits its scalability, thus motivating approximate indexing which sacrifices accuracy in favor of restricting the test to a small local neighborhood. This compromise leads to the loss of crucial long-range connections which as a result introduce false positives and exclude false negatives, and compromising some of the essential properties of the HSP. To overcome these limitations, this paper proposes a fast and exact algorithm for computing the HSP which enjoys sublinear complexity as demonstrated by extensive experimentation. Our hierarchical approach leverages the triangle inequality applied to pivots to enable efficient HSP search in metric spaces with the Hilbert Exclusion property. A key component of our approach is the concept of the <em>shifted generalized hyperplane</em> between two points, which allows for the invalidation of entire groups of points. Our approach ensures the computation of the exact HSP with efficiency, even for datasets containing hundreds of millions of points.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"133 ","pages":"Article 102565"},"PeriodicalIF":3.0000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Finding HSP neighbors via an exact, hierarchical approach\",\"authors\":\"Cole Foster ,&nbsp;Edgar Chávez ,&nbsp;Benjamin Kimia\",\"doi\":\"10.1016/j.is.2025.102565\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The Half Space Proximal (HSP) graph is a low out-degree monotonic graph with a wide range of applications in various domains, including combinatorial optimization in strings, enhancing <span><math><mi>k</mi></math></span>NN classification, simplifying chemical networks, estimating local intrinsic dimensionality, and generating uniform samples from skewed distributions, among others. However, the linear complexity of finding HSP neighbors of a query limits its scalability, thus motivating approximate indexing which sacrifices accuracy in favor of restricting the test to a small local neighborhood. This compromise leads to the loss of crucial long-range connections which as a result introduce false positives and exclude false negatives, and compromising some of the essential properties of the HSP. To overcome these limitations, this paper proposes a fast and exact algorithm for computing the HSP which enjoys sublinear complexity as demonstrated by extensive experimentation. Our hierarchical approach leverages the triangle inequality applied to pivots to enable efficient HSP search in metric spaces with the Hilbert Exclusion property. A key component of our approach is the concept of the <em>shifted generalized hyperplane</em> between two points, which allows for the invalidation of entire groups of points. Our approach ensures the computation of the exact HSP with efficiency, even for datasets containing hundreds of millions of points.</div></div>\",\"PeriodicalId\":50363,\"journal\":{\"name\":\"Information Systems\",\"volume\":\"133 \",\"pages\":\"Article 102565\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306437925000493\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000493","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

半空间近端图(Half Space Proximal, HSP)是一种低次单调图,在许多领域有着广泛的应用,包括字符串的组合优化、增强kNN分类、简化化学网络、估计局部固有维数以及从偏态分布中生成均匀样本等。然而,查找查询的HSP邻居的线性复杂性限制了它的可伸缩性,从而激发了近似索引,牺牲了准确性,从而将测试限制在较小的本地邻居中。这种妥协导致了关键的远程连接的丢失,从而引入假阳性和排除假阴性,并损害了HSP的一些基本属性。为了克服这些限制,本文提出了一种快速精确的算法来计算具有亚线性复杂性的HSP,并通过大量实验证明了这一点。我们的分层方法利用应用于枢轴的三角形不等式来实现在度量空间中具有希尔伯特不相容性质的高效HSP搜索。我们的方法的一个关键组成部分是两点之间的位移广义超平面的概念,它允许整个点群的无效。我们的方法确保了精确的HSP的计算效率,即使对于包含数亿个点的数据集也是如此。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Finding HSP neighbors via an exact, hierarchical approach
The Half Space Proximal (HSP) graph is a low out-degree monotonic graph with a wide range of applications in various domains, including combinatorial optimization in strings, enhancing kNN classification, simplifying chemical networks, estimating local intrinsic dimensionality, and generating uniform samples from skewed distributions, among others. However, the linear complexity of finding HSP neighbors of a query limits its scalability, thus motivating approximate indexing which sacrifices accuracy in favor of restricting the test to a small local neighborhood. This compromise leads to the loss of crucial long-range connections which as a result introduce false positives and exclude false negatives, and compromising some of the essential properties of the HSP. To overcome these limitations, this paper proposes a fast and exact algorithm for computing the HSP which enjoys sublinear complexity as demonstrated by extensive experimentation. Our hierarchical approach leverages the triangle inequality applied to pivots to enable efficient HSP search in metric spaces with the Hilbert Exclusion property. A key component of our approach is the concept of the shifted generalized hyperplane between two points, which allows for the invalidation of entire groups of points. Our approach ensures the computation of the exact HSP with efficiency, even for datasets containing hundreds of millions of points.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Systems
Information Systems 工程技术-计算机:信息系统
CiteScore
9.40
自引率
2.70%
发文量
112
审稿时长
53 days
期刊介绍: Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信