Filling data analysis gaps in time-resolved crystallography by machine learning.

IF 2.3 2区 物理与天体物理 Q3 CHEMISTRY, PHYSICAL
Structural Dynamics-Us Pub Date : 2025-01-21 eCollection Date: 2025-01-01 DOI:10.1063/4.0000280
Justin Trujillo, Russell Fung, Madan Kumar Shankar, Peter Schwander, Ahmad Hosseinizadeh
{"title":"Filling data analysis gaps in time-resolved crystallography by machine learning.","authors":"Justin Trujillo, Russell Fung, Madan Kumar Shankar, Peter Schwander, Ahmad Hosseinizadeh","doi":"10.1063/4.0000280","DOIUrl":null,"url":null,"abstract":"<p><p>There is a growing understanding of the structural dynamics of biological molecules fueled by x-ray crystallography experiments. Time-resolved serial femtosecond crystallography (TR-SFX) with x-ray Free Electron Lasers allows the measurement of ultrafast structural changes in proteins. Nevertheless, this technique comes with some limitations. One major challenge is the quality of data from TR-SFX measurements, which often faces issues like data sparsity, partial recording of Bragg reflections, timing errors, and pixel noise. To overcome these difficulties, conventionally, large volumes of data are collected and grouped into a few temporal bins. The data in each bin are then averaged and paired with the mean of their corresponding jittered timestamps. This procedure provides one structure per bin, resulting in a limited number of averaged structures for the entire time interval spanned by the experiment. Therefore, the information on ultrafast structural dynamics at high temporal resolution is lost. This has initiated research for advanced methods of analyzing experimental TR-SFX data beyond the standard binning and averaging method. To address this problem, we use a machine learning algorithm called Nonlinear Laplacian Spectral Analysis (NLSA), which has emerged as a promising technique for studying the dynamics of complex systems. In this work, we demonstrate the power of this algorithm using synthetic x-ray diffraction snapshots from a protein with significant data incompleteness, timing uncertainties, and noise. Our study confirms that NLSA is a suitable approach that effectively mitigates the effects of these artifacts in TR-SFX data and recovers accurate structural dynamics information hidden in such data.</p>","PeriodicalId":48683,"journal":{"name":"Structural Dynamics-Us","volume":"12 1","pages":"014101"},"PeriodicalIF":2.3000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11758283/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Structural Dynamics-Us","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1063/4.0000280","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

There is a growing understanding of the structural dynamics of biological molecules fueled by x-ray crystallography experiments. Time-resolved serial femtosecond crystallography (TR-SFX) with x-ray Free Electron Lasers allows the measurement of ultrafast structural changes in proteins. Nevertheless, this technique comes with some limitations. One major challenge is the quality of data from TR-SFX measurements, which often faces issues like data sparsity, partial recording of Bragg reflections, timing errors, and pixel noise. To overcome these difficulties, conventionally, large volumes of data are collected and grouped into a few temporal bins. The data in each bin are then averaged and paired with the mean of their corresponding jittered timestamps. This procedure provides one structure per bin, resulting in a limited number of averaged structures for the entire time interval spanned by the experiment. Therefore, the information on ultrafast structural dynamics at high temporal resolution is lost. This has initiated research for advanced methods of analyzing experimental TR-SFX data beyond the standard binning and averaging method. To address this problem, we use a machine learning algorithm called Nonlinear Laplacian Spectral Analysis (NLSA), which has emerged as a promising technique for studying the dynamics of complex systems. In this work, we demonstrate the power of this algorithm using synthetic x-ray diffraction snapshots from a protein with significant data incompleteness, timing uncertainties, and noise. Our study confirms that NLSA is a suitable approach that effectively mitigates the effects of these artifacts in TR-SFX data and recovers accurate structural dynamics information hidden in such data.

用机器学习填补时间分辨晶体学的数据分析空白。
在x射线晶体学实验的推动下,人们对生物分子的结构动力学有了越来越多的了解。时间分辨系列飞秒晶体学(TR-SFX)与x射线自由电子激光器允许测量蛋白质的超快结构变化。然而,这种技术有一些局限性。一个主要的挑战是TR-SFX测量数据的质量,它经常面临数据稀疏、布拉格反射的部分记录、定时误差和像素噪声等问题。为了克服这些困难,通常需要收集大量数据并将其分组到几个时间箱中。然后将每个bin中的数据取平均值,并与相应抖动时间戳的平均值配对。该程序为每个容器提供一个结构,从而在实验所跨越的整个时间间隔内产生有限数量的平均结构。因此,在高时间分辨率下丢失了超快结构动力学信息。这开启了对分析实验TR-SFX数据的先进方法的研究,超越了标准的分组和平均方法。为了解决这个问题,我们使用了一种名为非线性拉普拉斯谱分析(NLSA)的机器学习算法,该算法已成为研究复杂系统动力学的一种有前途的技术。在这项工作中,我们使用蛋白质的合成x射线衍射快照来证明该算法的功能,该图像具有显著的数据不完整性、时间不确定性和噪声。我们的研究证实,NLSA是一种合适的方法,可以有效地减轻这些伪影对TR-SFX数据的影响,并恢复隐藏在这些数据中的准确结构动力学信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Structural Dynamics-Us
Structural Dynamics-Us CHEMISTRY, PHYSICALPHYSICS, ATOMIC, MOLECU-PHYSICS, ATOMIC, MOLECULAR & CHEMICAL
CiteScore
5.50
自引率
3.60%
发文量
24
审稿时长
16 weeks
期刊介绍: Structural Dynamics focuses on the recent developments in experimental and theoretical methods and techniques that allow a visualization of the electronic and geometric structural changes in real time of chemical, biological, and condensed-matter systems. The community of scientists and engineers working on structural dynamics in such diverse systems often use similar instrumentation and methods. The journal welcomes articles dealing with fundamental problems of electronic and structural dynamics that are tackled by new methods, such as: Time-resolved X-ray and electron diffraction and scattering, Coherent diffractive imaging, Time-resolved X-ray spectroscopies (absorption, emission, resonant inelastic scattering, etc.), Time-resolved electron energy loss spectroscopy (EELS) and electron microscopy, Time-resolved photoelectron spectroscopies (UPS, XPS, ARPES, etc.), Multidimensional spectroscopies in the infrared, the visible and the ultraviolet, Nonlinear spectroscopies in the VUV, the soft and the hard X-ray domains, Theory and computational methods and algorithms for the analysis and description of structuraldynamics and their associated experimental signals. These new methods are enabled by new instrumentation, such as: X-ray free electron lasers, which provide flux, coherence, and time resolution, New sources of ultrashort electron pulses, New sources of ultrashort vacuum ultraviolet (VUV) to hard X-ray pulses, such as high-harmonic generation (HHG) sources or plasma-based sources, New sources of ultrashort infrared and terahertz (THz) radiation, New detectors for X-rays and electrons, New sample handling and delivery schemes, New computational capabilities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信