Jingyu Han, Jin Chen, Fan Wu, Wenzhen Wang, Yi Mao, Yiting Zhang
{"title":"用误差有界回归希尔伯特树索引历史时空样本","authors":"Jingyu Han, Jin Chen, Fan Wu, Wenzhen Wang, Yi Mao, Yiting Zhang","doi":"10.1016/j.compeleceng.2025.110394","DOIUrl":null,"url":null,"abstract":"<div><div>To support the querying of history spatio-temporal data, existing learned multi-dimensional indexes usually suffer from either the redundant scanning or missing answers during query processing due to their approximate indexing. To answer window queries constrained by a road network, this paper proposes to learn an Error-bounded regression Hilbert Tree (EHT) to predict a sample’s storage position with a function of Hilbert curve. The learning consists of data ordering and model training. During the data ordering, every road network constrained sample is projected onto a position-time grid whose cells are traversed by a Hilbert curve. During the model training, history samples are divided into a series of equal-duration periods, and an EHT is constructed for every period to establish the mapping between every spatio-temporal sample and its storage position. The EHT safely covers every qualifying sample within the given error bound, thus achieving a full recall. Further, it models the distribution with as tight a strip as possible, thus reducing the disk scanning cost to the minimum during query processing. Extensive experiments on real and synthetic datasets demonstrate that the EHT reduces the index size by a factor between 1.46 and 43.4 while consuming much fewer disk IOs, or at most one or two more disk IOs, compared to the state-of-the-art learned indexes. In comparison with the delicate indexes, the EHT reduces the index size by a factor between 5.46 and 62.5 while consuming much fewer or roughly the same number of disk IOs.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"124 ","pages":"Article 110394"},"PeriodicalIF":4.0000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Indexing history spatio-temporal samples with error-bounded regression Hilbert trees\",\"authors\":\"Jingyu Han, Jin Chen, Fan Wu, Wenzhen Wang, Yi Mao, Yiting Zhang\",\"doi\":\"10.1016/j.compeleceng.2025.110394\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>To support the querying of history spatio-temporal data, existing learned multi-dimensional indexes usually suffer from either the redundant scanning or missing answers during query processing due to their approximate indexing. To answer window queries constrained by a road network, this paper proposes to learn an Error-bounded regression Hilbert Tree (EHT) to predict a sample’s storage position with a function of Hilbert curve. The learning consists of data ordering and model training. During the data ordering, every road network constrained sample is projected onto a position-time grid whose cells are traversed by a Hilbert curve. During the model training, history samples are divided into a series of equal-duration periods, and an EHT is constructed for every period to establish the mapping between every spatio-temporal sample and its storage position. The EHT safely covers every qualifying sample within the given error bound, thus achieving a full recall. Further, it models the distribution with as tight a strip as possible, thus reducing the disk scanning cost to the minimum during query processing. Extensive experiments on real and synthetic datasets demonstrate that the EHT reduces the index size by a factor between 1.46 and 43.4 while consuming much fewer disk IOs, or at most one or two more disk IOs, compared to the state-of-the-art learned indexes. In comparison with the delicate indexes, the EHT reduces the index size by a factor between 5.46 and 62.5 while consuming much fewer or roughly the same number of disk IOs.</div></div>\",\"PeriodicalId\":50630,\"journal\":{\"name\":\"Computers & Electrical Engineering\",\"volume\":\"124 \",\"pages\":\"Article 110394\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Electrical Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0045790625003374\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790625003374","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Indexing history spatio-temporal samples with error-bounded regression Hilbert trees
To support the querying of history spatio-temporal data, existing learned multi-dimensional indexes usually suffer from either the redundant scanning or missing answers during query processing due to their approximate indexing. To answer window queries constrained by a road network, this paper proposes to learn an Error-bounded regression Hilbert Tree (EHT) to predict a sample’s storage position with a function of Hilbert curve. The learning consists of data ordering and model training. During the data ordering, every road network constrained sample is projected onto a position-time grid whose cells are traversed by a Hilbert curve. During the model training, history samples are divided into a series of equal-duration periods, and an EHT is constructed for every period to establish the mapping between every spatio-temporal sample and its storage position. The EHT safely covers every qualifying sample within the given error bound, thus achieving a full recall. Further, it models the distribution with as tight a strip as possible, thus reducing the disk scanning cost to the minimum during query processing. Extensive experiments on real and synthetic datasets demonstrate that the EHT reduces the index size by a factor between 1.46 and 43.4 while consuming much fewer disk IOs, or at most one or two more disk IOs, compared to the state-of-the-art learned indexes. In comparison with the delicate indexes, the EHT reduces the index size by a factor between 5.46 and 62.5 while consuming much fewer or roughly the same number of disk IOs.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.