{"title":"DeepUSPS:基于深度学习的无约束结构蛋白序列设计。","authors":"Zhichong Ma, Jiawen Yang","doi":"10.1002/prot.26847","DOIUrl":null,"url":null,"abstract":"<p><p>Currently, the unconstrained-structural protein sequence design models suffer from low optimization efficiency, and their generated proteins exhibit significant similarities to natural proteins and low thermal stability. To address these challenges, we propose the Deep Learning-Empowered Unconstrained-Structural Protein Sequence Design (DeepUSPS) model. To effectively address the inadequate thermal stability problem, we employ the innovative Inverted Dense Residual Network (IDRNet). To mitigate the designed proteins similarity issue, the Sequence-Pairwise Features Extraction Synthetic Network (SPFESN) is constructed. Furthermore, we introduce the Warm Restart AngularGrad (WRA) optimizer to optimize the 3D Position-Specific Scoring Matrix (3Dpssm) for unconstrained-structural protein sequence, only involving 2100 iterations (140.36 min) updates to generate idealization (IDE) protein sequences. We obtained a total of 1000 IDE protein sequences. Then we utilized in silico experiments to evaluate them, including similarity, clarity and iterations, thermal stability, spatial distribution of similarity, and predicted local-distance difference test (pLDDT) confidence assessment. Notably, the mean lg(E-value) for IDE protein sequences reached -0.051, the mean TM-score for IDE protein structures reached 0.594, the iterations only need 2100, and the mean Tm (melting point) for thermal stability reached 74.78°C. The average pLDDT value for 3D structures reached 76. Additionally, the IDE proteins' 3D structures exhibit diverse types. These in silico results conclusively demonstrate the superior performance of DeepUSPS compared with Hallucinate.</p>","PeriodicalId":56271,"journal":{"name":"Proteins-Structure Function and Bioinformatics","volume":" ","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DeepUSPS: Deep Learning-Empowered Unconstrained-Structural Protein Sequence Design.\",\"authors\":\"Zhichong Ma, Jiawen Yang\",\"doi\":\"10.1002/prot.26847\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Currently, the unconstrained-structural protein sequence design models suffer from low optimization efficiency, and their generated proteins exhibit significant similarities to natural proteins and low thermal stability. To address these challenges, we propose the Deep Learning-Empowered Unconstrained-Structural Protein Sequence Design (DeepUSPS) model. To effectively address the inadequate thermal stability problem, we employ the innovative Inverted Dense Residual Network (IDRNet). To mitigate the designed proteins similarity issue, the Sequence-Pairwise Features Extraction Synthetic Network (SPFESN) is constructed. Furthermore, we introduce the Warm Restart AngularGrad (WRA) optimizer to optimize the 3D Position-Specific Scoring Matrix (3Dpssm) for unconstrained-structural protein sequence, only involving 2100 iterations (140.36 min) updates to generate idealization (IDE) protein sequences. We obtained a total of 1000 IDE protein sequences. Then we utilized in silico experiments to evaluate them, including similarity, clarity and iterations, thermal stability, spatial distribution of similarity, and predicted local-distance difference test (pLDDT) confidence assessment. Notably, the mean lg(E-value) for IDE protein sequences reached -0.051, the mean TM-score for IDE protein structures reached 0.594, the iterations only need 2100, and the mean Tm (melting point) for thermal stability reached 74.78°C. The average pLDDT value for 3D structures reached 76. Additionally, the IDE proteins' 3D structures exhibit diverse types. These in silico results conclusively demonstrate the superior performance of DeepUSPS compared with Hallucinate.</p>\",\"PeriodicalId\":56271,\"journal\":{\"name\":\"Proteins-Structure Function and Bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteins-Structure Function and Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/prot.26847\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteins-Structure Function and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/prot.26847","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
目前,无约束结构蛋白序列设计模型存在优化效率低、生成的蛋白与天然蛋白相似度大、热稳定性低等问题。为了解决这些挑战,我们提出了深度学习授权的无约束结构蛋白序列设计(DeepUSPS)模型。为了有效地解决热稳定性不足的问题,我们采用了创新的反向密集残差网络(IDRNet)。为了缓解设计蛋白的相似性问题,构建了序列对特征提取合成网络(Sequence-Pairwise Features Extraction Synthetic Network, SPFESN)。此外,我们引入了Warm Restart AngularGrad (WRA)优化器,用于优化无约束结构蛋白序列的3D位置特异性评分矩阵(3Dpssm),仅涉及2100次迭代(140.36分钟)更新即可生成理想化(IDE)蛋白序列。我们总共获得了1000个IDE蛋白序列。在此基础上,利用计算机实验对其进行评价,包括相似度、清晰度和迭代度、热稳定性、相似度空间分布、预测局部距离差异测试(pLDDT)置信度评估。值得注意的是,IDE蛋白序列的平均lg(e值)达到-0.051,IDE蛋白结构的平均Tm -score达到0.594,迭代次数仅为2100次,热稳定性的平均Tm(熔点)达到74.78℃。3D结构的平均pLDDT值达到76。此外,IDE蛋白的3D结构表现出多种类型。这些在计算机上的结果最终证明了DeepUSPS与Hallucinate相比的优越性能。
DeepUSPS: Deep Learning-Empowered Unconstrained-Structural Protein Sequence Design.
Currently, the unconstrained-structural protein sequence design models suffer from low optimization efficiency, and their generated proteins exhibit significant similarities to natural proteins and low thermal stability. To address these challenges, we propose the Deep Learning-Empowered Unconstrained-Structural Protein Sequence Design (DeepUSPS) model. To effectively address the inadequate thermal stability problem, we employ the innovative Inverted Dense Residual Network (IDRNet). To mitigate the designed proteins similarity issue, the Sequence-Pairwise Features Extraction Synthetic Network (SPFESN) is constructed. Furthermore, we introduce the Warm Restart AngularGrad (WRA) optimizer to optimize the 3D Position-Specific Scoring Matrix (3Dpssm) for unconstrained-structural protein sequence, only involving 2100 iterations (140.36 min) updates to generate idealization (IDE) protein sequences. We obtained a total of 1000 IDE protein sequences. Then we utilized in silico experiments to evaluate them, including similarity, clarity and iterations, thermal stability, spatial distribution of similarity, and predicted local-distance difference test (pLDDT) confidence assessment. Notably, the mean lg(E-value) for IDE protein sequences reached -0.051, the mean TM-score for IDE protein structures reached 0.594, the iterations only need 2100, and the mean Tm (melting point) for thermal stability reached 74.78°C. The average pLDDT value for 3D structures reached 76. Additionally, the IDE proteins' 3D structures exhibit diverse types. These in silico results conclusively demonstrate the superior performance of DeepUSPS compared with Hallucinate.
期刊介绍:
PROTEINS : Structure, Function, and Bioinformatics publishes original reports of significant experimental and analytic research in all areas of protein research: structure, function, computation, genetics, and design. The journal encourages reports that present new experimental or computational approaches for interpreting and understanding data from biophysical chemistry, structural studies of proteins and macromolecular assemblies, alterations of protein structure and function engineered through techniques of molecular biology and genetics, functional analyses under physiologic conditions, as well as the interactions of proteins with receptors, nucleic acids, or other specific ligands or substrates. Research in protein and peptide biochemistry directed toward synthesizing or characterizing molecules that simulate aspects of the activity of proteins, or that act as inhibitors of protein function, is also within the scope of PROTEINS. In addition to full-length reports, short communications (usually not more than 4 printed pages) and prediction reports are welcome. Reviews are typically by invitation; authors are encouraged to submit proposed topics for consideration.