Data reduction in protein serial crystallography

IF 2.9 2区 材料科学 Q2 CHEMISTRY, MULTIDISCIPLINARY
IUCrJ Pub Date : 2024-03-01 DOI:10.1107/S205225252400054X
Marina Galchenkova , Alexandra Tolstikova , Bjarne Klopprogge , Janina Sprenger , Dominik Oberthuer , Wolfgang Brehm , Thomas A. White , Anton Barty , Henry N. Chapman , Oleksandr Yefanov , G. Williams (Editor)
{"title":"Data reduction in protein serial crystallography","authors":"Marina Galchenkova ,&nbsp;Alexandra Tolstikova ,&nbsp;Bjarne Klopprogge ,&nbsp;Janina Sprenger ,&nbsp;Dominik Oberthuer ,&nbsp;Wolfgang Brehm ,&nbsp;Thomas A. White ,&nbsp;Anton Barty ,&nbsp;Henry N. Chapman ,&nbsp;Oleksandr Yefanov ,&nbsp;G. Williams (Editor)","doi":"10.1107/S205225252400054X","DOIUrl":null,"url":null,"abstract":"<div><p>Various approaches for lossless and lossy compression are evaluated, and suitable quality assessment metrics for serial crystallographic data – used in combination with lossy data reduction – are described.</p></div><div><p>Serial crystallography (SX) has become an established technique for protein structure determination, especially when dealing with small or radiation-sensitive crystals and investigating fast or irreversible protein dynamics. The advent of newly developed multi-megapixel X-ray area detectors, capable of capturing over 1000 images per second, has brought about substantial benefits. However, this advancement also entails a notable increase in the volume of collected data. Today, up to 2 PB of data per experiment could be easily obtained under efficient operating conditions. The combined costs associated with storing data from multiple experiments provide a compelling incentive to develop strategies that effectively reduce the amount of data stored on disk while maintaining the quality of scientific outcomes. Lossless data-compression methods are designed to preserve the information content of the data but often struggle to achieve a high compression ratio when applied to experimental data that contain noise. Conversely, lossy compression methods offer the potential to greatly reduce the data volume. Nonetheless, it is vital to thoroughly assess the impact of data quality and scientific outcomes when employing lossy compression, as it inherently involves discarding information. The evaluation of lossy compression effects on data requires proper data quality metrics. In our research, we assess various approaches for both lossless and lossy compression techniques applied to SX data, and equally importantly, we describe metrics suitable for evaluating SX data quality.</p></div>","PeriodicalId":14775,"journal":{"name":"IUCrJ","volume":"11 2","pages":"Pages 190-201"},"PeriodicalIF":2.9000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10916297/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IUCrJ","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/org/science/article/pii/S2052252524000150","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Various approaches for lossless and lossy compression are evaluated, and suitable quality assessment metrics for serial crystallographic data – used in combination with lossy data reduction – are described.

Serial crystallography (SX) has become an established technique for protein structure determination, especially when dealing with small or radiation-sensitive crystals and investigating fast or irreversible protein dynamics. The advent of newly developed multi-megapixel X-ray area detectors, capable of capturing over 1000 images per second, has brought about substantial benefits. However, this advancement also entails a notable increase in the volume of collected data. Today, up to 2 PB of data per experiment could be easily obtained under efficient operating conditions. The combined costs associated with storing data from multiple experiments provide a compelling incentive to develop strategies that effectively reduce the amount of data stored on disk while maintaining the quality of scientific outcomes. Lossless data-compression methods are designed to preserve the information content of the data but often struggle to achieve a high compression ratio when applied to experimental data that contain noise. Conversely, lossy compression methods offer the potential to greatly reduce the data volume. Nonetheless, it is vital to thoroughly assess the impact of data quality and scientific outcomes when employing lossy compression, as it inherently involves discarding information. The evaluation of lossy compression effects on data requires proper data quality metrics. In our research, we assess various approaches for both lossless and lossy compression techniques applied to SX data, and equally importantly, we describe metrics suitable for evaluating SX data quality.

蛋白质序列晶体学的数据缩减
序列晶体学(SX)已成为蛋白质结构测定的一项成熟技术,尤其是在处理小晶体或辐射敏感晶体以及研究快速或不可逆蛋白质动力学时。新开发的百万像素 X 射线区域探测器能够每秒捕捉 1000 多幅图像,它的出现带来了巨大的好处。然而,这一进步也带来了所收集数据量的显著增加。如今,在有效的操作条件下,每次实验可轻松获得多达 2 PB 的数据。与存储多个实验数据相关的综合成本为开发既能有效减少存储在磁盘上的数据量,又能保持科学成果质量的策略提供了强大的动力。无损数据压缩方法旨在保留数据的信息内容,但在应用于含有噪声的实验数据时,往往难以达到较高的压缩比。相反,有损压缩方法则有可能大大减少数据量。不过,在采用有损压缩时,必须彻底评估数据质量和科学成果的影响,因为有损压缩本身就会丢弃信息。评估有损压缩对数据的影响需要适当的数据质量指标。在我们的研究中,我们评估了应用于 SX 数据的无损压缩和有损压缩技术的各种方法,同样重要的是,我们描述了适合评估 SX 数据质量的指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IUCrJ
IUCrJ CHEMISTRY, MULTIDISCIPLINARYCRYSTALLOGRAPH-CRYSTALLOGRAPHY
CiteScore
7.50
自引率
5.10%
发文量
95
审稿时长
10 weeks
期刊介绍: IUCrJ is a new fully open-access peer-reviewed journal from the International Union of Crystallography (IUCr). The journal will publish high-profile articles on all aspects of the sciences and technologies supported by the IUCr via its commissions, including emerging fields where structural results underpin the science reported in the article. Our aim is to make IUCrJ the natural home for high-quality structural science results. Chemists, biologists, physicists and material scientists will be actively encouraged to report their structural studies in IUCrJ.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信