Tom Pan, Chen Dun, Shikai Jin, Mitchell D Miller, Anastasios Kyrillidis, George N Phillips
{"title":"CrysFormer: Protein structure determination via Patterson maps, deep learning, and partial structure attention.","authors":"Tom Pan, Chen Dun, Shikai Jin, Mitchell D Miller, Anastasios Kyrillidis, George N Phillips","doi":"10.1063/4.0000252","DOIUrl":null,"url":null,"abstract":"<p><p>Determining the atomic-level structure of a protein has been a decades-long challenge. However, recent advances in transformers and related neural network architectures have enabled researchers to significantly improve solutions to this problem. These methods use large datasets of sequence information and corresponding known protein template structures, if available. Yet, such methods only focus on sequence information. Other available prior knowledge could also be utilized, such as constructs derived from x-ray crystallography experiments and the known structures of the most common conformations of amino acid residues, which we refer to as partial structures. To the best of our knowledge, we propose the first transformer-based model that directly utilizes experimental protein crystallographic data and partial structure information to calculate electron density maps of proteins. In particular, we use Patterson maps, which can be directly obtained from x-ray crystallography experimental data, thus bypassing the well-known crystallographic phase problem. We demonstrate that our method, CrysFormer, achieves precise predictions on two synthetic datasets of peptide fragments in crystalline forms, one with two residues per unit cell and the other with fifteen. These predictions can then be used to generate accurate atomic models using established crystallographic refinement programs.</p>","PeriodicalId":48683,"journal":{"name":"Structural Dynamics-Us","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11326852/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Structural Dynamics-Us","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1063/4.0000252","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Determining the atomic-level structure of a protein has been a decades-long challenge. However, recent advances in transformers and related neural network architectures have enabled researchers to significantly improve solutions to this problem. These methods use large datasets of sequence information and corresponding known protein template structures, if available. Yet, such methods only focus on sequence information. Other available prior knowledge could also be utilized, such as constructs derived from x-ray crystallography experiments and the known structures of the most common conformations of amino acid residues, which we refer to as partial structures. To the best of our knowledge, we propose the first transformer-based model that directly utilizes experimental protein crystallographic data and partial structure information to calculate electron density maps of proteins. In particular, we use Patterson maps, which can be directly obtained from x-ray crystallography experimental data, thus bypassing the well-known crystallographic phase problem. We demonstrate that our method, CrysFormer, achieves precise predictions on two synthetic datasets of peptide fragments in crystalline forms, one with two residues per unit cell and the other with fifteen. These predictions can then be used to generate accurate atomic models using established crystallographic refinement programs.
确定蛋白质的原子级结构是一项长达数十年的挑战。然而,最近在变压器和相关神经网络架构方面取得的进展使研究人员能够显著改善这一问题的解决方案。这些方法使用大量序列信息数据集和相应的已知蛋白质模板结构(如果有的话)。然而,这些方法只关注序列信息。我们还可以利用其他可用的先验知识,例如从 X 射线晶体学实验中获得的构造和氨基酸残基最常见构象的已知结构,我们将其称为部分结构。据我们所知,我们提出了第一个基于变压器的模型,直接利用蛋白质晶体学实验数据和部分结构信息来计算蛋白质的电子密度图。特别是,我们使用的帕特森图可以直接从 X 射线晶体学实验数据中获得,从而绕过了众所周知的晶体学相位问题。我们展示了我们的方法 CrysFormer,它能在两个合成数据集上对结晶形式的肽片段进行精确预测,其中一个数据集每个单元格有两个残基,另一个有十五个残基。这些预测结果可用于使用成熟的晶体学细化程序生成精确的原子模型。
Structural Dynamics-UsCHEMISTRY, PHYSICALPHYSICS, ATOMIC, MOLECU-PHYSICS, ATOMIC, MOLECULAR & CHEMICAL
CiteScore
5.50
自引率
3.60%
发文量
24
审稿时长
16 weeks
期刊介绍:
Structural Dynamics focuses on the recent developments in experimental and theoretical methods and techniques that allow a visualization of the electronic and geometric structural changes in real time of chemical, biological, and condensed-matter systems. The community of scientists and engineers working on structural dynamics in such diverse systems often use similar instrumentation and methods.
The journal welcomes articles dealing with fundamental problems of electronic and structural dynamics that are tackled by new methods, such as:
Time-resolved X-ray and electron diffraction and scattering,
Coherent diffractive imaging,
Time-resolved X-ray spectroscopies (absorption, emission, resonant inelastic scattering, etc.),
Time-resolved electron energy loss spectroscopy (EELS) and electron microscopy,
Time-resolved photoelectron spectroscopies (UPS, XPS, ARPES, etc.),
Multidimensional spectroscopies in the infrared, the visible and the ultraviolet,
Nonlinear spectroscopies in the VUV, the soft and the hard X-ray domains,
Theory and computational methods and algorithms for the analysis and description of structuraldynamics and their associated experimental signals.
These new methods are enabled by new instrumentation, such as:
X-ray free electron lasers, which provide flux, coherence, and time resolution,
New sources of ultrashort electron pulses,
New sources of ultrashort vacuum ultraviolet (VUV) to hard X-ray pulses, such as high-harmonic generation (HHG) sources or plasma-based sources,
New sources of ultrashort infrared and terahertz (THz) radiation,
New detectors for X-rays and electrons,
New sample handling and delivery schemes,
New computational capabilities.