NeurLZ:基于误差控制的神经学习,系统地提高科学数据的有损压缩性能

Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin
{"title":"NeurLZ:基于误差控制的神经学习,系统地提高科学数据的有损压缩性能","authors":"Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin","doi":"arxiv-2409.05785","DOIUrl":null,"url":null,"abstract":"Large-scale scientific simulations generate massive datasets that pose\nsignificant challenges for storage and I/O. While traditional lossy compression\ntechniques can improve performance, balancing compression ratio, data quality,\nand throughput remains difficult. To address this, we propose NeurLZ, a novel\ncross-field learning-based and error-controlled compression framework for\nscientific data. By integrating skipping DNN models, cross-field learning, and\nerror control, our framework aims to substantially enhance lossy compression\nperformance. Our contributions are three-fold: (1) We design a lightweight\nskipping model to provide high-fidelity detail retention, further improving\nprediction accuracy. (2) We adopt a cross-field learning approach to\nsignificantly improve data prediction accuracy, resulting in a substantially\nimproved compression ratio. (3) We develop an error control approach to provide\nstrict error bounds according to user requirements. We evaluated NeurLZ on\nseveral real-world HPC application datasets, including Nyx (cosmological\nsimulation), Miranda (large turbulence simulation), and Hurricane (weather\nsimulation). Experiments demonstrate that our framework achieves up to a 90%\nrelative reduction in bit rate under the same data distortion, compared to the\nbest existing approach.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control\",\"authors\":\"Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin\",\"doi\":\"arxiv-2409.05785\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large-scale scientific simulations generate massive datasets that pose\\nsignificant challenges for storage and I/O. While traditional lossy compression\\ntechniques can improve performance, balancing compression ratio, data quality,\\nand throughput remains difficult. To address this, we propose NeurLZ, a novel\\ncross-field learning-based and error-controlled compression framework for\\nscientific data. By integrating skipping DNN models, cross-field learning, and\\nerror control, our framework aims to substantially enhance lossy compression\\nperformance. Our contributions are three-fold: (1) We design a lightweight\\nskipping model to provide high-fidelity detail retention, further improving\\nprediction accuracy. (2) We adopt a cross-field learning approach to\\nsignificantly improve data prediction accuracy, resulting in a substantially\\nimproved compression ratio. (3) We develop an error control approach to provide\\nstrict error bounds according to user requirements. We evaluated NeurLZ on\\nseveral real-world HPC application datasets, including Nyx (cosmological\\nsimulation), Miranda (large turbulence simulation), and Hurricane (weather\\nsimulation). Experiments demonstrate that our framework achieves up to a 90%\\nrelative reduction in bit rate under the same data distortion, compared to the\\nbest existing approach.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"12 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.05785\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.05785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

大规模科学模拟会产生海量数据集,给存储和 I/O 带来巨大挑战。虽然传统的有损压缩技术可以提高性能,但要在压缩率、数据质量和吞吐量之间取得平衡仍然很困难。为了解决这个问题,我们提出了 NeurLZ,这是一种基于跨领域学习和误差控制的新型科学数据压缩框架。通过整合跳转 DNN 模型、跨场学习和错误控制,我们的框架旨在大幅提高有损压缩性能。我们的贡献有三个方面:(1)我们设计了一个轻量级跳转模型,以提供高保真细节保留,进一步提高预测精度。(2) 我们采用跨场学习方法来显著提高数据预测的准确性,从而大幅提高压缩率。(3) 我们开发了一种误差控制方法,可根据用户要求提供严格的误差界限。我们在多个真实世界的 HPC 应用数据集上评估了 NeurLZ,包括 Nyx(宇宙学模拟)、Miranda(大型湍流模拟)和 Hurricane(天气模拟)。实验证明,与现有的最佳方法相比,我们的框架在相同的数据失真条件下实现了高达 90% 的比特率相对降低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control
Large-scale scientific simulations generate massive datasets that pose significant challenges for storage and I/O. While traditional lossy compression techniques can improve performance, balancing compression ratio, data quality, and throughput remains difficult. To address this, we propose NeurLZ, a novel cross-field learning-based and error-controlled compression framework for scientific data. By integrating skipping DNN models, cross-field learning, and error control, our framework aims to substantially enhance lossy compression performance. Our contributions are three-fold: (1) We design a lightweight skipping model to provide high-fidelity detail retention, further improving prediction accuracy. (2) We adopt a cross-field learning approach to significantly improve data prediction accuracy, resulting in a substantially improved compression ratio. (3) We develop an error control approach to provide strict error bounds according to user requirements. We evaluated NeurLZ on several real-world HPC application datasets, including Nyx (cosmological simulation), Miranda (large turbulence simulation), and Hurricane (weather simulation). Experiments demonstrate that our framework achieves up to a 90% relative reduction in bit rate under the same data distortion, compared to the best existing approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信