DeepUSPS: Deep Learning-Empowered Unconstrained-Structural Protein Sequence Design.

IF 3.2 4区 生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
Zhichong Ma, Jiawen Yang
{"title":"DeepUSPS: Deep Learning-Empowered Unconstrained-Structural Protein Sequence Design.","authors":"Zhichong Ma, Jiawen Yang","doi":"10.1002/prot.26847","DOIUrl":null,"url":null,"abstract":"<p><p>Currently, the unconstrained-structural protein sequence design models suffer from low optimization efficiency, and their generated proteins exhibit significant similarities to natural proteins and low thermal stability. To address these challenges, we propose the Deep Learning-Empowered Unconstrained-Structural Protein Sequence Design (DeepUSPS) model. To effectively address the inadequate thermal stability problem, we employ the innovative Inverted Dense Residual Network (IDRNet). To mitigate the designed proteins similarity issue, the Sequence-Pairwise Features Extraction Synthetic Network (SPFESN) is constructed. Furthermore, we introduce the Warm Restart AngularGrad (WRA) optimizer to optimize the 3D Position-Specific Scoring Matrix (3Dpssm) for unconstrained-structural protein sequence, only involving 2100 iterations (140.36 min) updates to generate idealization (IDE) protein sequences. We obtained a total of 1000 IDE protein sequences. Then we utilized in silico experiments to evaluate them, including similarity, clarity and iterations, thermal stability, spatial distribution of similarity, and predicted local-distance difference test (pLDDT) confidence assessment. Notably, the mean lg(E-value) for IDE protein sequences reached -0.051, the mean TM-score for IDE protein structures reached 0.594, the iterations only need 2100, and the mean Tm (melting point) for thermal stability reached 74.78°C. The average pLDDT value for 3D structures reached 76. Additionally, the IDE proteins' 3D structures exhibit diverse types. These in silico results conclusively demonstrate the superior performance of DeepUSPS compared with Hallucinate.</p>","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.26847","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Currently, the unconstrained-structural protein sequence design models suffer from low optimization efficiency, and their generated proteins exhibit significant similarities to natural proteins and low thermal stability. To address these challenges, we propose the Deep Learning-Empowered Unconstrained-Structural Protein Sequence Design (DeepUSPS) model. To effectively address the inadequate thermal stability problem, we employ the innovative Inverted Dense Residual Network (IDRNet). To mitigate the designed proteins similarity issue, the Sequence-Pairwise Features Extraction Synthetic Network (SPFESN) is constructed. Furthermore, we introduce the Warm Restart AngularGrad (WRA) optimizer to optimize the 3D Position-Specific Scoring Matrix (3Dpssm) for unconstrained-structural protein sequence, only involving 2100 iterations (140.36 min) updates to generate idealization (IDE) protein sequences. We obtained a total of 1000 IDE protein sequences. Then we utilized in silico experiments to evaluate them, including similarity, clarity and iterations, thermal stability, spatial distribution of similarity, and predicted local-distance difference test (pLDDT) confidence assessment. Notably, the mean lg(E-value) for IDE protein sequences reached -0.051, the mean TM-score for IDE protein structures reached 0.594, the iterations only need 2100, and the mean Tm (melting point) for thermal stability reached 74.78°C. The average pLDDT value for 3D structures reached 76. Additionally, the IDE proteins' 3D structures exhibit diverse types. These in silico results conclusively demonstrate the superior performance of DeepUSPS compared with Hallucinate.

DeepUSPS:基于深度学习的无约束结构蛋白序列设计。
目前,无约束结构蛋白序列设计模型存在优化效率低、生成的蛋白与天然蛋白相似度大、热稳定性低等问题。为了解决这些挑战,我们提出了深度学习授权的无约束结构蛋白序列设计(DeepUSPS)模型。为了有效地解决热稳定性不足的问题,我们采用了创新的反向密集残差网络(IDRNet)。为了缓解设计蛋白的相似性问题,构建了序列对特征提取合成网络(Sequence-Pairwise Features Extraction Synthetic Network, SPFESN)。此外,我们引入了Warm Restart AngularGrad (WRA)优化器,用于优化无约束结构蛋白序列的3D位置特异性评分矩阵(3Dpssm),仅涉及2100次迭代(140.36分钟)更新即可生成理想化(IDE)蛋白序列。我们总共获得了1000个IDE蛋白序列。在此基础上,利用计算机实验对其进行评价,包括相似度、清晰度和迭代度、热稳定性、相似度空间分布、预测局部距离差异测试(pLDDT)置信度评估。值得注意的是,IDE蛋白序列的平均lg(e值)达到-0.051,IDE蛋白结构的平均Tm -score达到0.594,迭代次数仅为2100次,热稳定性的平均Tm(熔点)达到74.78℃。3D结构的平均pLDDT值达到76。此外,IDE蛋白的3D结构表现出多种类型。这些在计算机上的结果最终证明了DeepUSPS与Hallucinate相比的优越性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Proteins-Structure Function and Bioinformatics
Proteins-Structure Function and Bioinformatics 生物-生化与分子生物学
CiteScore
5.90
自引率
3.40%
发文量
172
审稿时长
3 months
期刊介绍: PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信