CPE-Pro：一种结构敏感的蛋白质表示和起源评估的深度学习方法。

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences Pub Date : 2025-06-08 DOI:10.1007/s12539-025-00732-4

Wenrui Gou, Wenhui Ge, Yang Tan, Mingchen Li, Guisheng Fan, Huiqun Yu

{"title":"CPE-Pro：一种结构敏感的蛋白质表示和起源评估的深度学习方法。","authors":"Wenrui Gou, Wenhui Ge, Yang Tan, Mingchen Li, Guisheng Fan, Huiqun Yu","doi":"10.1007/s12539-025-00732-4","DOIUrl":null,"url":null,"abstract":"Protein structures are fundamental to understanding their functions and interactions. With the continuous advancement of protein structure prediction methods, structure databases are rapidly expanding. Identifying the origin of protein structures is crucial for assessing the reliability of experimental resolution and computational prediction methods, as well as for guiding downstream biological research. Existing protein representation approaches often fail to capture subtle yet critical structural differences, posing challenges for precise structural traceability. To address this, we propose a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro), for the representation and origin evaluation of protein structures. CPE-Pro integrates a pre-trained protein Structural Sequence Language Model (SSLM) and Geometric Vector Perceptron-Graph Neural Network (GVP-GNN) to learn structure-aware protein representations and capture structural differences, enabling accurate classification across four origins of structural data. Preliminary results indicate that, compared to large-scale protein language models trained on extensive amino acid sequences, structural sequences enriched with local structural features enable the model to capture more informative protein characteristics, thereby enhancing and refining protein representations. Future research directions include extending the architecture to additional protein structure paradigms and developing evaluation methodologies for low-pLDDT predicted structures, providing more effective tools for protein structure analysis. The code, model weights, and all relevant materials are available at https://github.com/wr1102/CPE-Pro .","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation.\",\"authors\":\"Wenrui Gou, Wenhui Ge, Yang Tan, Mingchen Li, Guisheng Fan, Huiqun Yu\",\"doi\":\"10.1007/s12539-025-00732-4\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Protein structures are fundamental to understanding their functions and interactions. With the continuous advancement of protein structure prediction methods, structure databases are rapidly expanding. Identifying the origin of protein structures is crucial for assessing the reliability of experimental resolution and computational prediction methods, as well as for guiding downstream biological research. Existing protein representation approaches often fail to capture subtle yet critical structural differences, posing challenges for precise structural traceability. To address this, we propose a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro), for the representation and origin evaluation of protein structures. CPE-Pro integrates a pre-trained protein Structural Sequence Language Model (SSLM) and Geometric Vector Perceptron-Graph Neural Network (GVP-GNN) to learn structure-aware protein representations and capture structural differences, enabling accurate classification across four origins of structural data. Preliminary results indicate that, compared to large-scale protein language models trained on extensive amino acid sequences, structural sequences enriched with local structural features enable the model to capture more informative protein characteristics, thereby enhancing and refining protein representations. Future research directions include extending the architecture to additional protein structure paradigms and developing evaluation methodologies for low-pLDDT predicted structures, providing more effective tools for protein structure analysis. The code, model weights, and all relevant materials are available at https://github.com/wr1102/CPE-Pro .\",\"PeriodicalId\":13670,\"journal\":{\"name\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Interdisciplinary Sciences: Computational Life Sciences\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1007/s12539-025-00732-4\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-025-00732-4","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

蛋白质结构是理解其功能和相互作用的基础。随着蛋白质结构预测方法的不断进步，蛋白质结构数据库也在迅速扩大。确定蛋白质结构的起源对于评估实验分辨率和计算预测方法的可靠性以及指导下游生物学研究至关重要。现有的蛋白质表征方法往往不能捕捉到微妙但关键的结构差异，这对精确的结构可追溯性提出了挑战。为了解决这个问题，我们提出了一个结构敏感的监督深度学习模型，晶体vs预测评估器蛋白质结构（CPE-Pro），用于蛋白质结构的表示和起源评估。CPE-Pro集成了预训练的蛋白质结构序列语言模型（SSLM）和几何向量感知器-图神经网络（GVP-GNN），以学习结构感知的蛋白质表示并捕获结构差异，从而实现跨四个结构数据来源的准确分类。初步结果表明，与广泛氨基酸序列训练的大规模蛋白质语言模型相比，富含局部结构特征的结构序列使模型能够捕获更多信息丰富的蛋白质特征，从而增强和精炼蛋白质表征。未来的研究方向包括将该结构扩展到更多的蛋白质结构范式，开发低plddt预测结构的评估方法，为蛋白质结构分析提供更有效的工具。代码、模型权重和所有相关材料可在https://github.com/wr1102/CPE-Pro上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CPE-Pro: A Structure-Sensitive Deep Learning Method for Protein Representation and Origin Evaluation.

Protein structures are fundamental to understanding their functions and interactions. With the continuous advancement of protein structure prediction methods, structure databases are rapidly expanding. Identifying the origin of protein structures is crucial for assessing the reliability of experimental resolution and computational prediction methods, as well as for guiding downstream biological research. Existing protein representation approaches often fail to capture subtle yet critical structural differences, posing challenges for precise structural traceability. To address this, we propose a structure-sensitive supervised deep learning model, Crystal vs Predicted Evaluator for Protein Structure (CPE-Pro), for the representation and origin evaluation of protein structures. CPE-Pro integrates a pre-trained protein Structural Sequence Language Model (SSLM) and Geometric Vector Perceptron-Graph Neural Network (GVP-GNN) to learn structure-aware protein representations and capture structural differences, enabling accurate classification across four origins of structural data. Preliminary results indicate that, compared to large-scale protein language models trained on extensive amino acid sequences, structural sequences enriched with local structural features enable the model to capture more informative protein characteristics, thereby enhancing and refining protein representations. Future research directions include extending the architecture to additional protein structure paradigms and developing evaluation methodologies for low-pLDDT predicted structures, providing more effective tools for protein structure analysis. The code, model weights, and all relevant materials are available at https://github.com/wr1102/CPE-Pro .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Interdisciplinary Sciences: Computational Life Sciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

8.60

自引率

4.20%

发文量

期刊介绍： Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology. The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer. The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.