{"title":"Aird-MSI: A High Compression Rate and Decompression Speed Format for Mass Spectrometry Imaging Data.","authors":"Shuochao Li, Hongping Sheng, Pengyuan Du, Jingying Chen, Xixi Wang, Junjie Tong, Jiahua Hong, Xiaohan Jing, Miaoshan Lu, Changbin Yu","doi":"10.1021/acs.jproteome.5c00423","DOIUrl":null,"url":null,"abstract":"<p><p>Mass spectrometry imaging has emerged as a pivotal tool in spatial metabolomics, yet its reliance on the imzML format poses critical challenges in data storage, transmission, and computational efficiency. While imzML ensures cross-platform compatibility, its lower compressed binary architecture results in large file sizes and high parsing overhead, hindering cloud-based analysis and real-time visualization. This study introduces an enhanced Aird compression format optimized for spatial metabolomics through two innovations: (1) a dynamic combinatorial compression algorithm for integer-based encoding of <i>m</i>/<i>z</i> and intensity data; (2) a coordinate-separation storage strategy for rapid spatial indexing. Experimental validation on 47 public data sets demonstrated significant performance gains. Compared to imzML, Aird achieved a 70% reduction in storage footprint (mean compression ratio: 30.89%) while maintaining near-lossless data precision (F1-score = 99.75% at 0.1 ppm <i>m</i>/<i>z</i> tolerance). For high-precision-controlled data sets, Aird accelerated loading speeds by 13-fold in MZmine. The Aird format overcomes crucial bottlenecks in spatial metabolomics by harmonizing storage efficiency, computational speed, and analytical precision, reducing I/O latency for large cohorts. By achieving near-native feature detection accuracy, Aird establishes a robust infrastructure for translational applications, including disease biomarker discovery and pharmacokinetic imaging.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1021/acs.jproteome.5c00423","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Mass spectrometry imaging has emerged as a pivotal tool in spatial metabolomics, yet its reliance on the imzML format poses critical challenges in data storage, transmission, and computational efficiency. While imzML ensures cross-platform compatibility, its lower compressed binary architecture results in large file sizes and high parsing overhead, hindering cloud-based analysis and real-time visualization. This study introduces an enhanced Aird compression format optimized for spatial metabolomics through two innovations: (1) a dynamic combinatorial compression algorithm for integer-based encoding of m/z and intensity data; (2) a coordinate-separation storage strategy for rapid spatial indexing. Experimental validation on 47 public data sets demonstrated significant performance gains. Compared to imzML, Aird achieved a 70% reduction in storage footprint (mean compression ratio: 30.89%) while maintaining near-lossless data precision (F1-score = 99.75% at 0.1 ppm m/z tolerance). For high-precision-controlled data sets, Aird accelerated loading speeds by 13-fold in MZmine. The Aird format overcomes crucial bottlenecks in spatial metabolomics by harmonizing storage efficiency, computational speed, and analytical precision, reducing I/O latency for large cohorts. By achieving near-native feature detection accuracy, Aird establishes a robust infrastructure for translational applications, including disease biomarker discovery and pharmacokinetic imaging.
期刊介绍:
Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".