Martita Muñoz , José Fuentes-Sepúlveda , Cecilia Hernández , Diego Seco
{"title":"估计栅格数据的可压缩性","authors":"Martita Muñoz , José Fuentes-Sepúlveda , Cecilia Hernández , Diego Seco","doi":"10.1016/j.is.2025.102624","DOIUrl":null,"url":null,"abstract":"<div><div>The raster data model is widely used in Geographic Information Systems and image processing. The continuous growth of raster data volume poses significant challenges for storage and management. Compact representations of rasters have emerged as a critical solution to address this issue, leveraging data locality to achieve efficient compression. In this context, the research community has proposed compressibility measures aiming to estimate the compressibility of data. Some measures, initially proposed for sequences, have been extended to two- and three-dimensional matrices. This work conducts an experimental analysis of measures applied to raster data compressibility estimation. The first approach applies a linearization function on raster data with matrix representation and then uses existing one-dimensional compressibility measures. The evaluation of the approach compares 1D compressibility measures with 2D measures, data compressors, Compact Data Structures (CDSs), and spatial locality estimation techniques. The results show that spatial locality, alphabet size, and noise directly influence raster compressibility, having more impact over measures like <span><math><mi>z</mi></math></span>, <span><math><mi>v</mi></math></span>, and <span><math><mi>g</mi></math></span>, compressors (bzip, gzip) and a CDS called <span><math><msup><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>-raster. The second approach introduces <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mi>Δ</mi></mrow></msub></math></span>, a 2D compressibility measure sensitive to differences within the alphabet values. Its purpose is to refine the estimation of raster compressibility. Results indicate that <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mi>Δ</mi></mrow></msub></math></span> is affected by the actual values and their frequencies, aligning with the outcomes of some specific compressors. This alignment underscores the suitability of <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mi>Δ</mi></mrow></msub></math></span> for compressibility estimation tasks closely related to those performed by such compressors.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"136 ","pages":"Article 102624"},"PeriodicalIF":3.4000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Estimating the compressibility of raster data\",\"authors\":\"Martita Muñoz , José Fuentes-Sepúlveda , Cecilia Hernández , Diego Seco\",\"doi\":\"10.1016/j.is.2025.102624\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The raster data model is widely used in Geographic Information Systems and image processing. The continuous growth of raster data volume poses significant challenges for storage and management. Compact representations of rasters have emerged as a critical solution to address this issue, leveraging data locality to achieve efficient compression. In this context, the research community has proposed compressibility measures aiming to estimate the compressibility of data. Some measures, initially proposed for sequences, have been extended to two- and three-dimensional matrices. This work conducts an experimental analysis of measures applied to raster data compressibility estimation. The first approach applies a linearization function on raster data with matrix representation and then uses existing one-dimensional compressibility measures. The evaluation of the approach compares 1D compressibility measures with 2D measures, data compressors, Compact Data Structures (CDSs), and spatial locality estimation techniques. The results show that spatial locality, alphabet size, and noise directly influence raster compressibility, having more impact over measures like <span><math><mi>z</mi></math></span>, <span><math><mi>v</mi></math></span>, and <span><math><mi>g</mi></math></span>, compressors (bzip, gzip) and a CDS called <span><math><msup><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>-raster. The second approach introduces <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mi>Δ</mi></mrow></msub></math></span>, a 2D compressibility measure sensitive to differences within the alphabet values. Its purpose is to refine the estimation of raster compressibility. Results indicate that <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mi>Δ</mi></mrow></msub></math></span> is affected by the actual values and their frequencies, aligning with the outcomes of some specific compressors. This alignment underscores the suitability of <span><math><msub><mrow><mi>δ</mi></mrow><mrow><mi>Δ</mi></mrow></msub></math></span> for compressibility estimation tasks closely related to those performed by such compressors.</div></div>\",\"PeriodicalId\":50363,\"journal\":{\"name\":\"Information Systems\",\"volume\":\"136 \",\"pages\":\"Article 102624\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306437925001103\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437925001103","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
The raster data model is widely used in Geographic Information Systems and image processing. The continuous growth of raster data volume poses significant challenges for storage and management. Compact representations of rasters have emerged as a critical solution to address this issue, leveraging data locality to achieve efficient compression. In this context, the research community has proposed compressibility measures aiming to estimate the compressibility of data. Some measures, initially proposed for sequences, have been extended to two- and three-dimensional matrices. This work conducts an experimental analysis of measures applied to raster data compressibility estimation. The first approach applies a linearization function on raster data with matrix representation and then uses existing one-dimensional compressibility measures. The evaluation of the approach compares 1D compressibility measures with 2D measures, data compressors, Compact Data Structures (CDSs), and spatial locality estimation techniques. The results show that spatial locality, alphabet size, and noise directly influence raster compressibility, having more impact over measures like , , and , compressors (bzip, gzip) and a CDS called -raster. The second approach introduces , a 2D compressibility measure sensitive to differences within the alphabet values. Its purpose is to refine the estimation of raster compressibility. Results indicate that is affected by the actual values and their frequencies, aligning with the outcomes of some specific compressors. This alignment underscores the suitability of for compressibility estimation tasks closely related to those performed by such compressors.
期刊介绍:
Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems.
Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.