Predicting multiple conformations of ligand binding sites in proteins suggests that AlphaFold2 may remember too much.

IF 9.4 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Maria Lazou, Omeir Khan, Thu Nguyen, Dzmitry Padhorny, Dima Kozakov, Diane Joseph-McCarthy, Sandor Vajda
{"title":"Predicting multiple conformations of ligand binding sites in proteins suggests that AlphaFold2 may remember too much.","authors":"Maria Lazou, Omeir Khan, Thu Nguyen, Dzmitry Padhorny, Dima Kozakov, Diane Joseph-McCarthy, Sandor Vajda","doi":"10.1073/pnas.2412719121","DOIUrl":null,"url":null,"abstract":"<p><p>The goal of this paper is predicting the conformational distributions of ligand binding sites using the AlphaFold2 (AF2) protein structure prediction program with stochastic subsampling of the multiple sequence alignment (MSA). We explored the opening of cryptic ligand binding sites in 16 proteins, where the closed and open conformations define the expected extreme points of the conformational variation. Due to the many structures of these proteins in the Protein Data Bank (PDB), we were able to study whether the distribution of X-ray structures affects the distribution of AF2 models. We have found that AF2 generates both a cluster of open and a cluster of closed models for proteins that have comparable numbers of open and closed structures in the PDB and not too many other conformations. This was observed even with default MSA parameters, thus without further subsampling. In contrast, with the exception of a single protein, AF2 did not yield multiple clusters of conformations for proteins that had imbalanced numbers of open and closed structures in the PDB, or had substantial numbers of other structures. Subsampling improved the results only for a single protein, but very shallow MSA led to incorrect structures. The ability of generating both open and closed conformations for six out of the 16 proteins agrees with the success rates of similar studies reported in the literature. However, we showed that this partial success is due to AF2 \"remembering\" the conformational distributions in the PDB and that the approach fails to predict rarely seen conformations.</p>","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"121 48","pages":"e2412719121"},"PeriodicalIF":9.4000,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2412719121","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The goal of this paper is predicting the conformational distributions of ligand binding sites using the AlphaFold2 (AF2) protein structure prediction program with stochastic subsampling of the multiple sequence alignment (MSA). We explored the opening of cryptic ligand binding sites in 16 proteins, where the closed and open conformations define the expected extreme points of the conformational variation. Due to the many structures of these proteins in the Protein Data Bank (PDB), we were able to study whether the distribution of X-ray structures affects the distribution of AF2 models. We have found that AF2 generates both a cluster of open and a cluster of closed models for proteins that have comparable numbers of open and closed structures in the PDB and not too many other conformations. This was observed even with default MSA parameters, thus without further subsampling. In contrast, with the exception of a single protein, AF2 did not yield multiple clusters of conformations for proteins that had imbalanced numbers of open and closed structures in the PDB, or had substantial numbers of other structures. Subsampling improved the results only for a single protein, but very shallow MSA led to incorrect structures. The ability of generating both open and closed conformations for six out of the 16 proteins agrees with the success rates of similar studies reported in the literature. However, we showed that this partial success is due to AF2 "remembering" the conformational distributions in the PDB and that the approach fails to predict rarely seen conformations.

预测蛋白质中配体结合位点的多种构象表明,AlphaFold2 可能记住了太多东西。
本文的目的是利用 AlphaFold2(AF2)蛋白质结构预测程序和多序列比对(MSA)的随机子采样来预测配体结合位点的构象分布。我们探索了 16 种蛋白质中隐蔽配体结合位点的开放情况,其中封闭构象和开放构象定义了构象变化的预期极值点。由于蛋白质数据库(PDB)中有许多这些蛋白质的结构,我们得以研究 X 射线结构的分布是否会影响 AF2 模型的分布。我们发现,对于在 PDB 中具有相当数量的开放和封闭结构,且没有太多其他构象的蛋白质,AF2 会生成开放模型群和封闭模型群。即使使用默认的 MSA 参数,也能观察到这种情况,因此无需进一步的子采样。相反,除了一个蛋白质外,AF2 对于 PDB 中开放结构和封闭结构数量不平衡或有大量其他结构的蛋白质并没有产生多个构象群。子取样只改善了单个蛋白质的结果,但非常浅的 MSA 会导致结构不正确。在 16 种蛋白质中,有 6 种蛋白质能够生成开放和封闭构象,这与文献中报道的类似研究的成功率一致。然而,我们发现这种部分成功是由于 AF2 "记住 "了 PDB 中的构象分布,而且该方法无法预测很少见的构象。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.00
自引率
0.90%
发文量
3575
审稿时长
2.5 months
期刊介绍: The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信