AlphaFold2's training set powers its predictions of some fold-switched conformations.

IF 4.5 3区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Protein Science Pub Date : 2025-04-01 DOI:10.1002/pro.70105
Joseph W Schafer, Lauren L Porter
{"title":"AlphaFold2's training set powers its predictions of some fold-switched conformations.","authors":"Joseph W Schafer, Lauren L Porter","doi":"10.1002/pro.70105","DOIUrl":null,"url":null,"abstract":"<p><p>AlphaFold2 (AF2), a deep-learning-based model that predicts protein structures from their amino acid sequences, has recently been used to predict multiple protein conformations. In some cases, AF2 has successfully predicted both dominant and alternative conformations of fold-switching proteins, which remodel their secondary and/or tertiary structures in response to cellular stimuli. Whether AF2 has learned enough protein folding principles to reliably predict alternative conformations outside of its training set is unclear. Previous work suggests that AF2 predicted these alternative conformations by memorizing them during training. Here, we use CFold-an implementation of the AF2 network trained on a more limited subset of experimentally determined protein structures-to directly test how well the AF2 architecture predicts alternative conformations of fold switchers outside of its training set. We tested CFold on eight fold switchers from six protein families. These proteins-whose secondary structures switch between α-helix and β-sheet and/or whose hydrogen bonding networks are reconfigured dramatically-had not been tested previously, and only one of their alternative conformations was in CFold's training set. Successful CFold predictions would indicate that the AF2 architecture can predict disparate alternative conformations of fold-switched conformations outside of its training set, while unsuccessful predictions would suggest that AF2 predictions of these alternative conformations likely arise from association with structures learned during training. Despite sampling 1300-4300 structures/protein with various sequence sampling techniques, CFold predicted only one alternative structure outside of its training set accurately and with high confidence while also generating experimentally inconsistent structures with higher confidence. Though these results indicate that AF2's current success in predicting alternative conformations of fold switchers stems largely from its training data, results from a sequence pruning technique suggest developments that could lead to a more reliable generative model in the future.</p>","PeriodicalId":20761,"journal":{"name":"Protein Science","volume":"34 4","pages":"e70105"},"PeriodicalIF":4.5000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934219/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Protein Science","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pro.70105","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

AlphaFold2 (AF2), a deep-learning-based model that predicts protein structures from their amino acid sequences, has recently been used to predict multiple protein conformations. In some cases, AF2 has successfully predicted both dominant and alternative conformations of fold-switching proteins, which remodel their secondary and/or tertiary structures in response to cellular stimuli. Whether AF2 has learned enough protein folding principles to reliably predict alternative conformations outside of its training set is unclear. Previous work suggests that AF2 predicted these alternative conformations by memorizing them during training. Here, we use CFold-an implementation of the AF2 network trained on a more limited subset of experimentally determined protein structures-to directly test how well the AF2 architecture predicts alternative conformations of fold switchers outside of its training set. We tested CFold on eight fold switchers from six protein families. These proteins-whose secondary structures switch between α-helix and β-sheet and/or whose hydrogen bonding networks are reconfigured dramatically-had not been tested previously, and only one of their alternative conformations was in CFold's training set. Successful CFold predictions would indicate that the AF2 architecture can predict disparate alternative conformations of fold-switched conformations outside of its training set, while unsuccessful predictions would suggest that AF2 predictions of these alternative conformations likely arise from association with structures learned during training. Despite sampling 1300-4300 structures/protein with various sequence sampling techniques, CFold predicted only one alternative structure outside of its training set accurately and with high confidence while also generating experimentally inconsistent structures with higher confidence. Though these results indicate that AF2's current success in predicting alternative conformations of fold switchers stems largely from its training data, results from a sequence pruning technique suggest developments that could lead to a more reliable generative model in the future.

AlphaFold2的训练集增强了它对一些折叠交换构象的预测能力。
AlphaFold2 (AF2)是一种基于深度学习的模型,可以根据氨基酸序列预测蛋白质结构,最近已被用于预测多种蛋白质构象。在某些情况下,AF2成功地预测了折叠开关蛋白的显性和替代构象,这些构象在细胞刺激下重塑了它们的二级和/或三级结构。AF2是否已经学习了足够的蛋白质折叠原理来可靠地预测其训练集之外的其他构象尚不清楚。先前的研究表明,AF2通过在训练中记忆它们来预测这些不同的构象。在这里,我们使用cfold——在实验确定的蛋白质结构的一个更有限的子集上训练的AF2网络的实现——直接测试AF2架构在其训练集之外预测折叠开关的替代构象的效果。我们对来自6个蛋白质家族的8个折叠开关进行了CFold测试。这些蛋白质——其二级结构在α-螺旋和β-薄片之间切换,或者其氢键网络被戏剧性地重新配置——之前没有被测试过,只有一种替代的构象在cold的训练集中。成功的CFold预测将表明AF2架构可以预测其训练集之外的折叠开关构象的不同替代构象,而不成功的预测将表明AF2对这些替代构象的预测可能来自与训练期间学习的结构的关联。尽管使用各种序列采样技术对1300-4300个结构/蛋白质进行了采样,但cold仅准确且高置信度地预测了训练集之外的一种替代结构,同时也以较高置信度生成了实验不一致的结构。虽然这些结果表明AF2目前在预测折叠开关的替代构象方面的成功主要来自其训练数据,但序列修剪技术的结果表明,未来可能会产生更可靠的生成模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Protein Science
Protein Science 生物-生化与分子生物学
CiteScore
12.40
自引率
1.20%
发文量
246
审稿时长
1 months
期刊介绍: Protein Science, the flagship journal of The Protein Society, is a publication that focuses on advancing fundamental knowledge in the field of protein molecules. The journal welcomes original reports and review articles that contribute to our understanding of protein function, structure, folding, design, and evolution. Additionally, Protein Science encourages papers that explore the applications of protein science in various areas such as therapeutics, protein-based biomaterials, bionanotechnology, synthetic biology, and bioelectronics. The journal accepts manuscript submissions in any suitable format for review, with the requirement of converting the manuscript to journal-style format only upon acceptance for publication. Protein Science is indexed and abstracted in numerous databases, including the Agricultural & Environmental Science Database (ProQuest), Biological Science Database (ProQuest), CAS: Chemical Abstracts Service (ACS), Embase (Elsevier), Health & Medical Collection (ProQuest), Health Research Premium Collection (ProQuest), Materials Science & Engineering Database (ProQuest), MEDLINE/PubMed (NLM), Natural Science Collection (ProQuest), and SciTech Premium Collection (ProQuest).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信