用误差有界回归希尔伯特树索引历史时空样本

IF 4 3区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Jingyu Han, Jin Chen, Fan Wu, Wenzhen Wang, Yi Mao, Yiting Zhang
{"title":"用误差有界回归希尔伯特树索引历史时空样本","authors":"Jingyu Han,&nbsp;Jin Chen,&nbsp;Fan Wu,&nbsp;Wenzhen Wang,&nbsp;Yi Mao,&nbsp;Yiting Zhang","doi":"10.1016/j.compeleceng.2025.110394","DOIUrl":null,"url":null,"abstract":"<div><div>To support the querying of history spatio-temporal data, existing learned multi-dimensional indexes usually suffer from either the redundant scanning or missing answers during query processing due to their approximate indexing. To answer window queries constrained by a road network, this paper proposes to learn an Error-bounded regression Hilbert Tree (EHT) to predict a sample’s storage position with a function of Hilbert curve. The learning consists of data ordering and model training. During the data ordering, every road network constrained sample is projected onto a position-time grid whose cells are traversed by a Hilbert curve. During the model training, history samples are divided into a series of equal-duration periods, and an EHT is constructed for every period to establish the mapping between every spatio-temporal sample and its storage position. The EHT safely covers every qualifying sample within the given error bound, thus achieving a full recall. Further, it models the distribution with as tight a strip as possible, thus reducing the disk scanning cost to the minimum during query processing. Extensive experiments on real and synthetic datasets demonstrate that the EHT reduces the index size by a factor between 1.46 and 43.4 while consuming much fewer disk IOs, or at most one or two more disk IOs, compared to the state-of-the-art learned indexes. In comparison with the delicate indexes, the EHT reduces the index size by a factor between 5.46 and 62.5 while consuming much fewer or roughly the same number of disk IOs.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"124 ","pages":"Article 110394"},"PeriodicalIF":4.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Indexing history spatio-temporal samples with error-bounded regression Hilbert trees\",\"authors\":\"Jingyu Han,&nbsp;Jin Chen,&nbsp;Fan Wu,&nbsp;Wenzhen Wang,&nbsp;Yi Mao,&nbsp;Yiting Zhang\",\"doi\":\"10.1016/j.compeleceng.2025.110394\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>To support the querying of history spatio-temporal data, existing learned multi-dimensional indexes usually suffer from either the redundant scanning or missing answers during query processing due to their approximate indexing. To answer window queries constrained by a road network, this paper proposes to learn an Error-bounded regression Hilbert Tree (EHT) to predict a sample’s storage position with a function of Hilbert curve. The learning consists of data ordering and model training. During the data ordering, every road network constrained sample is projected onto a position-time grid whose cells are traversed by a Hilbert curve. During the model training, history samples are divided into a series of equal-duration periods, and an EHT is constructed for every period to establish the mapping between every spatio-temporal sample and its storage position. The EHT safely covers every qualifying sample within the given error bound, thus achieving a full recall. Further, it models the distribution with as tight a strip as possible, thus reducing the disk scanning cost to the minimum during query processing. Extensive experiments on real and synthetic datasets demonstrate that the EHT reduces the index size by a factor between 1.46 and 43.4 while consuming much fewer disk IOs, or at most one or two more disk IOs, compared to the state-of-the-art learned indexes. In comparison with the delicate indexes, the EHT reduces the index size by a factor between 5.46 and 62.5 while consuming much fewer or roughly the same number of disk IOs.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"124 \",\"pages\":\"Article 110394\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790625003374\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625003374","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

为了支持历史时空数据的查询,现有的学习多维索引由于其近似索引,在查询过程中存在冗余扫描或缺失答案的问题。为了回答受道路网络约束的窗口查询,本文提出学习误差有界回归希尔伯特树(EHT),利用希尔伯特曲线的函数来预测样本的存储位置。学习包括数据排序和模型训练。在数据排序过程中,每个路网约束样本被投影到一个位置-时间网格上,该网格的单元被希尔伯特曲线遍历。在模型训练过程中,将历史样本划分为一系列等持续时间的时间段,并为每个时间段构建EHT,建立每个时空样本与其存储位置之间的映射关系。EHT安全地覆盖了给定误差范围内的每个合格样本,从而实现了完全召回。此外,它用尽可能紧的条带对分布进行建模,从而在查询处理期间将磁盘扫描成本降低到最小。在真实数据集和合成数据集上进行的大量实验表明,与最先进的学习索引相比,EHT将索引大小减少了1.46到43.4倍,同时消耗更少的磁盘IOs,或者最多多消耗一到两个磁盘IOs。与精细索引相比,EHT将索引大小减少了5.46到62.5倍,同时消耗的磁盘io数量更少或大致相同。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Indexing history spatio-temporal samples with error-bounded regression Hilbert trees
To support the querying of history spatio-temporal data, existing learned multi-dimensional indexes usually suffer from either the redundant scanning or missing answers during query processing due to their approximate indexing. To answer window queries constrained by a road network, this paper proposes to learn an Error-bounded regression Hilbert Tree (EHT) to predict a sample’s storage position with a function of Hilbert curve. The learning consists of data ordering and model training. During the data ordering, every road network constrained sample is projected onto a position-time grid whose cells are traversed by a Hilbert curve. During the model training, history samples are divided into a series of equal-duration periods, and an EHT is constructed for every period to establish the mapping between every spatio-temporal sample and its storage position. The EHT safely covers every qualifying sample within the given error bound, thus achieving a full recall. Further, it models the distribution with as tight a strip as possible, thus reducing the disk scanning cost to the minimum during query processing. Extensive experiments on real and synthetic datasets demonstrate that the EHT reduces the index size by a factor between 1.46 and 43.4 while consuming much fewer disk IOs, or at most one or two more disk IOs, compared to the state-of-the-art learned indexes. In comparison with the delicate indexes, the EHT reduces the index size by a factor between 5.46 and 62.5 while consuming much fewer or roughly the same number of disk IOs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Electrical Engineering
Computers & Electrical Engineering 工程技术-工程:电子与电气
CiteScore
9.20
自引率
7.00%
发文量
661
审稿时长
47 days
期刊介绍: The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency. Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信