Density based learned spatial index for clustered data

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2025-08-11 DOI:10.1016/j.is.2025.102606

Xiaofei Zhao, Kam-Yiu Lam

{"title":"Density based learned spatial index for clustered data","authors":"Xiaofei Zhao, Kam-Yiu Lam","doi":"10.1016/j.is.2025.102606","DOIUrl":null,"url":null,"abstract":"<div><div>Retrieving spatial points, such as GPS records or Point of Interests, that satisfy specific location-based query criteria is a core operation in location-based services. Recent studies have shown that learned indexes can outperform traditional indexing methods in both query performance and space efficiency by leveraging data distribution to construct compact predictive models. On the other hand, traditional indexes typically make minimal assumptions about the underlying data distribution. In real-world spatial databases, data is often non-uniformly distributed and tends to cluster in specific regions or along road networks. Adaptivity to such data patterns may bring performance benefits.</div><div>In this paper, we explore the construction of efficient learned indexes that exploit the clustering characteristics of spatial datasets. Specifically, we propose a Density-based Grid Learning Spatial Index (DGLSI), which partitions the spatial domain based on point density and utilizes learned models, including multiple recursive model indexes to predict the grid cell IDs of query points. We evaluate DGLSI’s performance on real-world GPS datasets and demonstrate that the proposed methods outperform analogous grid-based indexes across various query workloads, including nearest point queries and range queries while maintaining high space efficiency.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"135 ","pages":"Article 102606"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925000900","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Retrieving spatial points, such as GPS records or Point of Interests, that satisfy specific location-based query criteria is a core operation in location-based services. Recent studies have shown that learned indexes can outperform traditional indexing methods in both query performance and space efficiency by leveraging data distribution to construct compact predictive models. On the other hand, traditional indexes typically make minimal assumptions about the underlying data distribution. In real-world spatial databases, data is often non-uniformly distributed and tends to cluster in specific regions or along road networks. Adaptivity to such data patterns may bring performance benefits.

In this paper, we explore the construction of efficient learned indexes that exploit the clustering characteristics of spatial datasets. Specifically, we propose a Density-based Grid Learning Spatial Index (DGLSI), which partitions the spatial domain based on point density and utilizes learned models, including multiple recursive model indexes to predict the grid cell IDs of query points. We evaluate DGLSI’s performance on real-world GPS datasets and demonstrate that the proposed methods outperform analogous grid-based indexes across various query workloads, including nearest point queries and range queries while maintaining high space efficiency.

查看原文本刊更多论文

基于密度的聚类数据学习空间索引

检索满足特定基于位置的查询条件的空间点（如GPS记录或兴趣点）是基于位置的服务中的核心操作。最近的研究表明，通过利用数据分布构造紧凑的预测模型，学习索引在查询性能和空间效率方面都优于传统的索引方法。另一方面，传统索引通常对底层数据分布的假设很少。在现实世界的空间数据库中，数据通常是不均匀分布的，并且倾向于在特定区域或沿着道路网络聚集。对此类数据模式的适应性可能带来性能优势。本文探讨了利用空间数据集聚类特征构建高效学习索引的方法。具体来说，我们提出了一种基于密度的网格学习空间索引（DGLSI），它基于点密度划分空间域，并利用包括多个递归模型索引在内的学习模型来预测查询点的网格单元id。我们评估了DGLSI在真实GPS数据集上的性能，并证明了所提出的方法在各种查询工作负载（包括最近点查询和范围查询）上优于类似的基于网格的索引，同时保持了较高的空间效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.