{"title":"TERSE/PROLIX (TRPX) - a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data.","authors":"Senik Matinyan, Jan Pieter Abrahams","doi":"10.1107/S205327332300760X","DOIUrl":null,"url":null,"abstract":"<p><p>High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.</p>","PeriodicalId":106,"journal":{"name":"Acta Crystallographica Section A: Foundations and Advances","volume":" ","pages":"536-541"},"PeriodicalIF":1.9000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10626653/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Crystallographica Section A: Foundations and Advances","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1107/S205327332300760X","RegionNum":4,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/25 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.
期刊介绍:
Acta Crystallographica Section A: Foundations and Advances publishes articles reporting advances in the theory and practice of all areas of crystallography in the broadest sense. As well as traditional crystallography, this includes nanocrystals, metacrystals, amorphous materials, quasicrystals, synchrotron and XFEL studies, coherent scattering, diffraction imaging, time-resolved studies and the structure of strain and defects in materials.
The journal has two parts, a rapid-publication Advances section and the traditional Foundations section. Articles for the Advances section are of particularly high value and impact. They receive expedited treatment and may be highlighted by an accompanying scientific commentary article and a press release. Further details are given in the November 2013 Editorial.
The central themes of the journal are, on the one hand, experimental and theoretical studies of the properties and arrangements of atoms, ions and molecules in condensed matter, periodic, quasiperiodic or amorphous, ideal or real, and, on the other, the theoretical and experimental aspects of the various methods to determine these properties and arrangements.