Random, de novo, and conserved proteins: How structure and disorder predictors perform differently.

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
ACS Applied Electronic Materials Pub Date : 2024-06-01 Epub Date: 2024-01-16 DOI:10.1002/prot.26652
Lasse Middendorf, Lars A Eicholt
{"title":"Random, de novo, and conserved proteins: How structure and disorder predictors perform differently.","authors":"Lasse Middendorf, Lars A Eicholt","doi":"10.1002/prot.26652","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.26652","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Understanding the emergence and structural characteristics of de novo and random proteins is crucial for unraveling protein evolution and designing novel enzymes. However, experimental determination of their structures remains challenging. Recent advancements in protein structure prediction, particularly with AlphaFold2 (AF2), have expanded our knowledge of protein structures, but their applicability to de novo and random proteins is unclear. In this study, we investigate the structural predictions and confidence scores of AF2 and protein language model-based predictor ESMFold for de novo and conserved proteins from Drosophila and a dataset of comparable random proteins. We find that the structural predictions for de novo and random proteins differ significantly from conserved proteins. Interestingly, a positive correlation between disorder and confidence scores (pLDDT) is observed for de novo and random proteins, in contrast to the negative correlation observed for conserved proteins. Furthermore, the performance of structure predictors for de novo and random proteins is hampered by the lack of sequence identity. We also observe fluctuating median predicted disorder among different sequence length quartiles for random proteins, suggesting an influence of sequence length on disorder predictions. In conclusion, while structure predictors provide initial insights into the structural composition of de novo and random proteins, their accuracy and applicability to such proteins remain limited. Experimental determination of their structures is necessary for a comprehensive understanding. The positive correlation between disorder and pLDDT could imply a potential for conditional folding and transient binding interactions of de novo and random proteins.

随机蛋白、新蛋白和保守蛋白:结构和紊乱预测因子的不同表现。
了解新蛋白质和随机蛋白质的出现和结构特征对于揭示蛋白质进化和设计新型酶至关重要。然而,对它们的结构进行实验测定仍然具有挑战性。最近在蛋白质结构预测方面取得的进展,特别是 AlphaFold2(AF2)的应用,扩展了我们对蛋白质结构的认识,但它们对新蛋白质和随机蛋白质的适用性尚不清楚。在本研究中,我们研究了 AF2 和基于蛋白质语言模型的预测器 ESMFold 对果蝇新蛋白和保守蛋白以及可比随机蛋白数据集的结构预测和置信度评分。我们发现,对新蛋白和随机蛋白的结构预测与保守蛋白有很大不同。有趣的是,我们发现新蛋白和随机蛋白的无序度和置信度得分(pLDDT)呈正相关,而保守蛋白则呈负相关。此外,由于缺乏序列同一性,从头蛋白质和随机蛋白质的结构预测器的性能也受到了影响。我们还观察到,在随机蛋白质的不同序列长度四分位数中,预测的无序度中位数也有波动,这表明序列长度对无序度预测有影响。总之,虽然结构预测器提供了对全新蛋白质和随机蛋白质结构组成的初步见解,但其准确性和对此类蛋白质的适用性仍然有限。要想全面了解这些蛋白质的结构,就必须通过实验确定其结构。无序和 pLDDT 之间的正相关可能意味着全新蛋白质和随机蛋白质的条件折叠和瞬时结合相互作用的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信