Aird-MSI: A High Compression Rate and Decompression Speed Format for Mass Spectrometry Imaging Data.

IF 3.6 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Shuochao Li, Hongping Sheng, Pengyuan Du, Jingying Chen, Xixi Wang, Junjie Tong, Jiahua Hong, Xiaohan Jing, Miaoshan Lu, Changbin Yu
{"title":"Aird-MSI: A High Compression Rate and Decompression Speed Format for Mass Spectrometry Imaging Data.","authors":"Shuochao Li, Hongping Sheng, Pengyuan Du, Jingying Chen, Xixi Wang, Junjie Tong, Jiahua Hong, Xiaohan Jing, Miaoshan Lu, Changbin Yu","doi":"10.1021/acs.jproteome.5c00423","DOIUrl":null,"url":null,"abstract":"<p><p>Mass spectrometry imaging has emerged as a pivotal tool in spatial metabolomics, yet its reliance on the imzML format poses critical challenges in data storage, transmission, and computational efficiency. While imzML ensures cross-platform compatibility, its lower compressed binary architecture results in large file sizes and high parsing overhead, hindering cloud-based analysis and real-time visualization. This study introduces an enhanced Aird compression format optimized for spatial metabolomics through two innovations: (1) a dynamic combinatorial compression algorithm for integer-based encoding of <i>m</i>/<i>z</i> and intensity data; (2) a coordinate-separation storage strategy for rapid spatial indexing. Experimental validation on 47 public data sets demonstrated significant performance gains. Compared to imzML, Aird achieved a 70% reduction in storage footprint (mean compression ratio: 30.89%) while maintaining near-lossless data precision (F1-score = 99.75% at 0.1 ppm <i>m</i>/<i>z</i> tolerance). For high-precision-controlled data sets, Aird accelerated loading speeds by 13-fold in MZmine. The Aird format overcomes crucial bottlenecks in spatial metabolomics by harmonizing storage efficiency, computational speed, and analytical precision, reducing I/O latency for large cohorts. By achieving near-native feature detection accuracy, Aird establishes a robust infrastructure for translational applications, including disease biomarker discovery and pharmacokinetic imaging.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1021/acs.jproteome.5c00423","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Mass spectrometry imaging has emerged as a pivotal tool in spatial metabolomics, yet its reliance on the imzML format poses critical challenges in data storage, transmission, and computational efficiency. While imzML ensures cross-platform compatibility, its lower compressed binary architecture results in large file sizes and high parsing overhead, hindering cloud-based analysis and real-time visualization. This study introduces an enhanced Aird compression format optimized for spatial metabolomics through two innovations: (1) a dynamic combinatorial compression algorithm for integer-based encoding of m/z and intensity data; (2) a coordinate-separation storage strategy for rapid spatial indexing. Experimental validation on 47 public data sets demonstrated significant performance gains. Compared to imzML, Aird achieved a 70% reduction in storage footprint (mean compression ratio: 30.89%) while maintaining near-lossless data precision (F1-score = 99.75% at 0.1 ppm m/z tolerance). For high-precision-controlled data sets, Aird accelerated loading speeds by 13-fold in MZmine. The Aird format overcomes crucial bottlenecks in spatial metabolomics by harmonizing storage efficiency, computational speed, and analytical precision, reducing I/O latency for large cohorts. By achieving near-native feature detection accuracy, Aird establishes a robust infrastructure for translational applications, including disease biomarker discovery and pharmacokinetic imaging.

Aird-MSI:质谱成像数据的高压缩率和解压速度格式。
质谱成像已经成为空间代谢组学的关键工具,但它对imzML格式的依赖在数据存储、传输和计算效率方面提出了严峻的挑战。虽然imzML确保了跨平台兼容性,但其较低的压缩二进制体系结构导致了较大的文件大小和较高的解析开销,阻碍了基于云的分析和实时可视化。本研究通过两项创新引入了一种针对空间代谢组学优化的增强Aird压缩格式:(1)基于整数编码的m/z和强度数据的动态组合压缩算法;(2)基于坐标分离的快速空间索引存储策略。在47个公共数据集上的实验验证显示了显著的性能提升。与imzML相比,Aird实现了70%的存储空间减少(平均压缩比:30.89%),同时保持了近乎无损的数据精度(F1-score = 99.75%, 0.1 ppm m/z公差)。对于高精度控制的数据集,Aird在MZmine中将加载速度加快了13倍。Aird格式通过协调存储效率、计算速度和分析精度,减少大型队列的I/O延迟,克服了空间代谢组学的关键瓶颈。通过实现接近原生特征检测的准确性,Aird为转化应用建立了强大的基础设施,包括疾病生物标志物发现和药代动力学成像。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Proteome Research
Journal of Proteome Research 生物-生化研究方法
CiteScore
9.00
自引率
4.50%
发文量
251
审稿时长
3 months
期刊介绍: Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信