Assessing generative model coverage of protein structures with SHAPES.

IF 7.7
Cell systems Pub Date : 2025-08-20 Epub Date: 2025-07-29 DOI:10.1016/j.cels.2025.101347
Tianyu Lu, Melissa Liu, Yilin Chen, Jinho Kim, Po-Ssu Huang
{"title":"Assessing generative model coverage of protein structures with SHAPES.","authors":"Tianyu Lu, Melissa Liu, Yilin Chen, Jinho Kim, Po-Ssu Huang","doi":"10.1016/j.cels.2025.101347","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advances in generative modeling enable efficient sampling of protein structures, but their tendency to optimize for designability imposes a bias toward idealized structures at the expense of loops and other complex structural motifs that are critical for function. We introduce SHAPES (structural and hierarchical assessment of proteins with embedding similarity) to evaluate five state-of-the-art generative models of protein structures. Using structural embeddings across multiple structural hierarchies, ranging from local geometries to global protein architectures, we reveal substantial undersampling of the observed protein structure space by these models. We use Fréchet protein distance (FPD) to quantify distributional coverage. Different models are distinct in their coverage behavior across different sampling noise scales and temperatures. The frequency of tertiary motifs (TERMs) further supports the observations. More robust sequence design and structure prediction methods are likely crucial in guiding the development of models with improved coverage of the designable protein space. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"101347"},"PeriodicalIF":7.7000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321228/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2025.101347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/29 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent advances in generative modeling enable efficient sampling of protein structures, but their tendency to optimize for designability imposes a bias toward idealized structures at the expense of loops and other complex structural motifs that are critical for function. We introduce SHAPES (structural and hierarchical assessment of proteins with embedding similarity) to evaluate five state-of-the-art generative models of protein structures. Using structural embeddings across multiple structural hierarchies, ranging from local geometries to global protein architectures, we reveal substantial undersampling of the observed protein structure space by these models. We use Fréchet protein distance (FPD) to quantify distributional coverage. Different models are distinct in their coverage behavior across different sampling noise scales and temperatures. The frequency of tertiary motifs (TERMs) further supports the observations. More robust sequence design and structure prediction methods are likely crucial in guiding the development of models with improved coverage of the designable protein space. A record of this paper's transparent peer review process is included in the supplemental information.

用SHAPES评估蛋白质结构的生成模型覆盖率。
生成建模的最新进展使蛋白质结构的有效采样成为可能,但它们倾向于优化可设计性,以牺牲循环和其他对功能至关重要的复杂结构基序为代价,对理想化结构施加了偏见。我们引入了形状(蛋白质的结构和分层评估与嵌入相似性)来评估五种最先进的蛋白质结构生成模型。使用跨多个结构层次的结构嵌入,从局部几何到全局蛋白质结构,我们通过这些模型揭示了观察到的蛋白质结构空间的大量欠采样。我们使用fr蛋白距离(FPD)来量化分布覆盖率。不同的模型在不同的采样噪声尺度和温度下的覆盖行为是不同的。三级基序(TERMs)的频率进一步支持了观察结果。更稳健的序列设计和结构预测方法可能对指导模型的发展至关重要,这些模型可以提高可设计蛋白质空间的覆盖率。本文的透明同行评议过程记录包含在补充信息中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信