Estimating the compressibility of raster data

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2025-09-13 DOI:10.1016/j.is.2025.102624

Martita Muñoz , José Fuentes-Sepúlveda , Cecilia Hernández , Diego Seco

{"title":"Estimating the compressibility of raster data","authors":"Martita Muñoz , José Fuentes-Sepúlveda , Cecilia Hernández , Diego Seco","doi":"10.1016/j.is.2025.102624","DOIUrl":null,"url":null,"abstract":"<div><div>The raster data model is widely used in Geographic Information Systems and image processing. The continuous growth of raster data volume poses significant challenges for storage and management. Compact representations of rasters have emerged as a critical solution to address this issue, leveraging data locality to achieve efficient compression. In this context, the research community has proposed compressibility measures aiming to estimate the compressibility of data. Some measures, initially proposed for sequences, have been extended to two- and three-dimensional matrices. This work conducts an experimental analysis of measures applied to raster data compressibility estimation. The first approach applies a linearization function on raster data with matrix representation and then uses existing one-dimensional compressibility measures. The evaluation of the approach compares 1D compressibility measures with 2D measures, data compressors, Compact Data Structures (CDSs), and spatial locality estimation techniques. The results show that spatial locality, alphabet size, and noise directly influence raster compressibility, having more impact over measures like <span><math><mi>z</mi></math></span>, <span><math><mi>v</mi></math></span>, and <span><math><mi>g</mi></math></span>, compressors (bzip, gzip) and a CDS called <span><math><msup><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>-raster. The second approach introduces <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mi>Δ</mi></mrow></msub></math></span>, a 2D compressibility measure sensitive to differences within the alphabet values. Its purpose is to refine the estimation of raster compressibility. Results indicate that <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mi>Δ</mi></mrow></msub></math></span> is affected by the actual values and their frequencies, aligning with the outcomes of some specific compressors. This alignment underscores the suitability of <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mi>Δ</mi></mrow></msub></math></span> for compressibility estimation tasks closely related to those performed by such compressors.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102624"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925001103","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The raster data model is widely used in Geographic Information Systems and image processing. The continuous growth of raster data volume poses significant challenges for storage and management. Compact representations of rasters have emerged as a critical solution to address this issue, leveraging data locality to achieve efficient compression. In this context, the research community has proposed compressibility measures aiming to estimate the compressibility of data. Some measures, initially proposed for sequences, have been extended to two- and three-dimensional matrices. This work conducts an experimental analysis of measures applied to raster data compressibility estimation. The first approach applies a linearization function on raster data with matrix representation and then uses existing one-dimensional compressibility measures. The evaluation of the approach compares 1D compressibility measures with 2D measures, data compressors, Compact Data Structures (CDSs), and spatial locality estimation techniques. The results show that spatial locality, alphabet size, and noise directly influence raster compressibility, having more impact over measures like

z

v

, and

g

, compressors (bzip, gzip) and a CDS called

k^{2}

-raster. The second approach introduces

δ_{Δ}

, a 2D compressibility measure sensitive to differences within the alphabet values. Its purpose is to refine the estimation of raster compressibility. Results indicate that

δ_{Δ}

is affected by the actual values and their frequencies, aligning with the outcomes of some specific compressors. This alignment underscores the suitability of

δ_{Δ}

for compressibility estimation tasks closely related to those performed by such compressors.

查看原文本刊更多论文

估计栅格数据的可压缩性

栅格数据模型在地理信息系统和图像处理中有着广泛的应用。栅格数据量的不断增长对存储和管理提出了重大挑战。光栅的紧凑表示已经成为解决这个问题的关键解决方案，利用数据局部性来实现有效的压缩。在此背景下，研究界提出了旨在估计数据可压缩性的可压缩性度量。一些最初针对序列提出的测度，已经推广到二维和三维矩阵。本文对栅格数据压缩性估计方法进行了实验分析。第一种方法对矩阵表示的栅格数据应用线性化函数，然后使用现有的一维压缩性度量。该方法的评估比较了一维可压缩性度量与二维度量、数据压缩器、紧凑数据结构（cds）和空间局域估计技术。结果表明，空间局域性、字母大小和噪声直接影响栅格的可压缩性，对z、v和g、压缩器（bzip、gzip）和称为k2-栅格的CDS等措施的影响更大。第二种方法引入δΔ，这是一种2D可压缩性度量，对字母值之间的差异非常敏感。其目的是改进栅格可压缩性的估计。结果表明，δΔ受实际值及其频率的影响，与某些特定压缩机的结果一致。这种一致性强调了δΔ对压缩性估计任务的适用性，这些任务与这些压缩器执行的任务密切相关。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.