SAXS-A-FOLD: a website for fast ensemble modeling optimizing the fit of AlphaFold or user-supplied protein structures with flexible regions to SAXS data.
IF 6.1 3区 材料科学Q1 Biochemistry, Genetics and Molecular Biology
Emre Brookes, Joseph E Curtis, Aaron Householder, Mattia Rocco
{"title":"<i>SAXS-A-FOLD</i>: a website for fast ensemble modeling optimizing the fit of <i>AlphaFold</i> or user-supplied protein structures with flexible regions to SAXS data.","authors":"Emre Brookes, Joseph E Curtis, Aaron Householder, Mattia Rocco","doi":"10.1107/S1600576725003590","DOIUrl":null,"url":null,"abstract":"<p><p>AI programs such as <i>AlphaFold</i> (<i>AF</i>) are having a major impact on structural biology. However, predicted unstructured regions, the arrangement of linker-connected domains and their conformational changes in response to environmental variables present challenges that are not easily dealt with on purely computational grounds. An approach that uses predicted (or solved) protein modules/domains linked by potentially unstructured regions and that generates ensembles of models optimized against small-angle X-ray scattering (SAXS) data has been recently described [Brookes <i>et al.</i> (2023). <i>J. Appl. Cryst.</i> <b>56</b>, 910-926]. Its implementation on a public-domain website, <i>SAXS-A-FOLD</i> (https://saxsafold.genapp.rocks), is presented here. User-supplied SAXS experimental intensity <i>I</i>(<i>q</i>) versus scattering vector magnitude <i>q</i> and the derived pair-wise distance distribution function <i>P</i>(<i>r</i>) versus <i>r</i> are first uploaded. An <i>AF</i> or user-supplied structure (currently only single chains without prosthetic groups) is then uploaded and displayed, and its SAXS <i>I</i>(<i>q</i>) and <i>P</i>(<i>r</i>) profiles are computed and compared with the experimental data. If uploaded from <i>AF</i>, the structure is color-coded by the associated confidence level: on this basis, the website automatically proposes potential flexible regions that can be user modified. For user-supplied structures, these regions have to be directly entered. A starting pool of typically 10-50 × 10<sup>3</sup> conformations is generated using a Monte Carlo method that samples backbone dihedral angles along the chosen segments of potential flexibility in the protein structures. The initial pool is reduced to obtain a tractable set of models, for which <i>P</i>(<i>r</i>) and <i>I</i>(<i>q</i>) are computed with fast established methods. A global fit is performed using non-negatively constrained least-squares (NNLS) versus original data. The <i>P</i>(<i>r</i>) and <i>I</i>(<i>q</i>) NNLS results are then displayed, showing both the reconstructed curves and the contributing model curves, with their percentage contributions. A <i>WAXSiS</i> (https://waxsis.uni-saarland.de) implementation is utilized to calculate an <i>I</i>(<i>q</i>) for each selected model. These sets can be enhanced by adding a user-defined number of models generated before and after each selected model in the original Monte Carlo pool, ensuring the inclusion of nearby models that might better fit the data. Finally, NNLS is used on the <i>WAXSiS</i>-generated <i>I</i>(<i>q</i>) set versus the original <i>I</i>(<i>q</i>) data, with the results displaying the contributing models and their <i>I</i>(<i>q</i>). Aside from being representative of contributing conformations, the models selected by <i>SAXS-A-FOLD</i> could constitute a set of starting structures for more advanced MD simulations.</p>","PeriodicalId":14950,"journal":{"name":"Journal of Applied Crystallography","volume":"58 Pt 3","pages":"1034-1049"},"PeriodicalIF":6.1000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12135990/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Crystallography","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1107/S1600576725003590","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0
Abstract
AI programs such as AlphaFold (AF) are having a major impact on structural biology. However, predicted unstructured regions, the arrangement of linker-connected domains and their conformational changes in response to environmental variables present challenges that are not easily dealt with on purely computational grounds. An approach that uses predicted (or solved) protein modules/domains linked by potentially unstructured regions and that generates ensembles of models optimized against small-angle X-ray scattering (SAXS) data has been recently described [Brookes et al. (2023). J. Appl. Cryst.56, 910-926]. Its implementation on a public-domain website, SAXS-A-FOLD (https://saxsafold.genapp.rocks), is presented here. User-supplied SAXS experimental intensity I(q) versus scattering vector magnitude q and the derived pair-wise distance distribution function P(r) versus r are first uploaded. An AF or user-supplied structure (currently only single chains without prosthetic groups) is then uploaded and displayed, and its SAXS I(q) and P(r) profiles are computed and compared with the experimental data. If uploaded from AF, the structure is color-coded by the associated confidence level: on this basis, the website automatically proposes potential flexible regions that can be user modified. For user-supplied structures, these regions have to be directly entered. A starting pool of typically 10-50 × 103 conformations is generated using a Monte Carlo method that samples backbone dihedral angles along the chosen segments of potential flexibility in the protein structures. The initial pool is reduced to obtain a tractable set of models, for which P(r) and I(q) are computed with fast established methods. A global fit is performed using non-negatively constrained least-squares (NNLS) versus original data. The P(r) and I(q) NNLS results are then displayed, showing both the reconstructed curves and the contributing model curves, with their percentage contributions. A WAXSiS (https://waxsis.uni-saarland.de) implementation is utilized to calculate an I(q) for each selected model. These sets can be enhanced by adding a user-defined number of models generated before and after each selected model in the original Monte Carlo pool, ensuring the inclusion of nearby models that might better fit the data. Finally, NNLS is used on the WAXSiS-generated I(q) set versus the original I(q) data, with the results displaying the contributing models and their I(q). Aside from being representative of contributing conformations, the models selected by SAXS-A-FOLD could constitute a set of starting structures for more advanced MD simulations.
期刊介绍:
Many research topics in condensed matter research, materials science and the life sciences make use of crystallographic methods to study crystalline and non-crystalline matter with neutrons, X-rays and electrons. Articles published in the Journal of Applied Crystallography focus on these methods and their use in identifying structural and diffusion-controlled phase transformations, structure-property relationships, structural changes of defects, interfaces and surfaces, etc. Developments of instrumentation and crystallographic apparatus, theory and interpretation, numerical analysis and other related subjects are also covered. The journal is the primary place where crystallographic computer program information is published.