一个可解释的深度几何学习模型，利用大规模蛋白质语言模型预测突变对蛋白质-蛋白质相互作用的影响

IF 5.7 2区化学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Journal of Cheminformatics Pub Date : 2025-03-21 DOI:10.1186/s13321-025-00979-5

Caiya Zhang, Yan Sun, Pingzhao Hu

{"title":"一个可解释的深度几何学习模型，利用大规模蛋白质语言模型预测突变对蛋白质-蛋白质相互作用的影响","authors":"Caiya Zhang, Yan Sun, Pingzhao Hu","doi":"10.1186/s13321-025-00979-5","DOIUrl":null,"url":null,"abstract":"<div><p>Protein–protein interactions (PPIs) are central to the mechanisms of signaling pathways and immune responses, which can help us understand disease etiology. Therefore, there is a significant need for efficient and rapid automated approaches to predict changes in PPIs. In recent years, there has been a significant increase in applying deep learning techniques to predict changes in binding affinity between the original protein complex and its mutant variants. Particularly, the adoption of graph neural networks (GNNs) has gained prominence for their ability to learn representations of protein–protein complexes. However, the conventional GNNs have mainly concentrated on capturing local features, often disregarding the interactions among distant elements that hold potential important information. In this study, we have developed a transformer-based graph neural network to extract features of the mutant segment from the three-dimensional structure of protein–protein complexes. By embracing both local and global features, the approach ensures a more comprehensive understanding of the intricate relationships, thus promising more accurate predictions of binding affinity changes. To enhance the representation capability of protein features, we incorporate a large-scale pre-trained protein language model into our approach and employ the global protein feature it provides. The proposed model is shown to be able to predict the mutation changes in binding affinity with a root mean square error of 1.10 and a Pearson correlation coefficient of near 0.71, as demonstrated by performance on test and validation cases. Our experiments on all five datasets, including both single mutant and multiple mutant cases, demonstrate that our model outperforms four state-of-the-art baseline methods, and the efficacy was subjected to comprehensive experimental evaluation. Our study introduces a transformer-based graph neural network approach to accurately predict changes in protein–protein interactions (PPIs). By integrating local and global features and leveraging pretrained protein language models, our model outperforms state-of-the-art methods across diverse datasets. The results of this study can provide new views for studying immune responses and disease etiology related to protein mutations. Furthermore, this approach may contribute to other biological or biochemical studies related to PPIs.</p><p><b>Scientific contribution</b> Our scientific contribution lies in the development of a novel transformer-based graph neural network tailored to predict changes in protein–protein interactions (PPIs) with excellent accuracy. By seamlessly integrating both local and global features extracted from the three-dimensional structure of protein–protein complexes, and leveraging the rich representations provided by pretrained protein language models, our approach surpasses existing methods across diverse datasets. Our findings may offer novel insights for the understanding of complex disease etiology associated with protein mutations. The novel tool can be applicable to various biological and biochemical investigations involving protein mutations.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"17 1","pages":""},"PeriodicalIF":5.7000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00979-5","citationCount":"0","resultStr":"{\"title\":\"An interpretable deep geometric learning model to predict the effects of mutations on protein–protein interactions using large-scale protein language model\",\"authors\":\"Caiya Zhang, Yan Sun, Pingzhao Hu\",\"doi\":\"10.1186/s13321-025-00979-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Protein–protein interactions (PPIs) are central to the mechanisms of signaling pathways and immune responses, which can help us understand disease etiology. Therefore, there is a significant need for efficient and rapid automated approaches to predict changes in PPIs. In recent years, there has been a significant increase in applying deep learning techniques to predict changes in binding affinity between the original protein complex and its mutant variants. Particularly, the adoption of graph neural networks (GNNs) has gained prominence for their ability to learn representations of protein–protein complexes. However, the conventional GNNs have mainly concentrated on capturing local features, often disregarding the interactions among distant elements that hold potential important information. In this study, we have developed a transformer-based graph neural network to extract features of the mutant segment from the three-dimensional structure of protein–protein complexes. By embracing both local and global features, the approach ensures a more comprehensive understanding of the intricate relationships, thus promising more accurate predictions of binding affinity changes. To enhance the representation capability of protein features, we incorporate a large-scale pre-trained protein language model into our approach and employ the global protein feature it provides. The proposed model is shown to be able to predict the mutation changes in binding affinity with a root mean square error of 1.10 and a Pearson correlation coefficient of near 0.71, as demonstrated by performance on test and validation cases. Our experiments on all five datasets, including both single mutant and multiple mutant cases, demonstrate that our model outperforms four state-of-the-art baseline methods, and the efficacy was subjected to comprehensive experimental evaluation. Our study introduces a transformer-based graph neural network approach to accurately predict changes in protein–protein interactions (PPIs). By integrating local and global features and leveraging pretrained protein language models, our model outperforms state-of-the-art methods across diverse datasets. The results of this study can provide new views for studying immune responses and disease etiology related to protein mutations. Furthermore, this approach may contribute to other biological or biochemical studies related to PPIs.</p><p><b>Scientific contribution</b> Our scientific contribution lies in the development of a novel transformer-based graph neural network tailored to predict changes in protein–protein interactions (PPIs) with excellent accuracy. By seamlessly integrating both local and global features extracted from the three-dimensional structure of protein–protein complexes, and leveraging the rich representations provided by pretrained protein language models, our approach surpasses existing methods across diverse datasets. Our findings may offer novel insights for the understanding of complex disease etiology associated with protein mutations. The novel tool can be applicable to various biological and biochemical investigations involving protein mutations.</p></div>\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-025-00979-5\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-025-00979-5\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-025-00979-5","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

蛋白-蛋白相互作用（PPIs）是信号通路和免疫反应机制的核心，可以帮助我们了解疾病的病因。因此，迫切需要高效、快速的自动化方法来预测ppi的变化。近年来，应用深度学习技术来预测原始蛋白质复合物与其突变变体之间结合亲和力的变化已经显著增加。特别是，图神经网络（gnn）的采用因其学习蛋白质-蛋白质复合物表示的能力而获得了突出的地位。然而，传统的gnn主要集中于捕获局部特征，往往忽略了具有潜在重要信息的遥远元素之间的相互作用。在这项研究中，我们开发了一种基于变压器的图神经网络，从蛋白质-蛋白质复合物的三维结构中提取突变片段的特征。通过结合局部和全局特征，该方法确保了对复杂关系的更全面的理解，从而有望更准确地预测结合亲和力的变化。为了提高蛋白质特征的表示能力，我们将大规模预训练的蛋白质语言模型纳入我们的方法中，并利用它提供的全局蛋白质特征。在测试和验证案例中的表现表明，所提出的模型能够预测结合亲和力的突变变化，均方根误差为1.10，Pearson相关系数接近0.71。我们在所有五个数据集（包括单突变和多突变病例）上的实验表明，我们的模型优于四种最先进的基线方法，并且有效性经过了全面的实验评估。我们的研究引入了一种基于变压器的图神经网络方法来准确预测蛋白质-蛋白质相互作用（PPIs）的变化。通过整合局部和全局特征并利用预训练的蛋白质语言模型，我们的模型在不同的数据集上优于最先进的方法。本研究结果可为研究与蛋白质突变相关的免疫应答和疾病病因学提供新的视角。此外，这种方法可能有助于其他与ppi相关的生物学或生化研究。我们的科学贡献在于开发了一种新型的基于变压器的图神经网络，该网络可以非常准确地预测蛋白质-蛋白质相互作用（PPIs）的变化。通过无缝集成从蛋白质-蛋白质复合物的三维结构中提取的局部和全局特征，并利用预训练蛋白质语言模型提供的丰富表示，我们的方法超越了现有的跨不同数据集的方法。我们的发现可能为理解与蛋白质突变相关的复杂疾病病因学提供新的见解。该新工具可适用于涉及蛋白质突变的各种生物学和生化研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An interpretable deep geometric learning model to predict the effects of mutations on protein–protein interactions using large-scale protein language model

Protein–protein interactions (PPIs) are central to the mechanisms of signaling pathways and immune responses, which can help us understand disease etiology. Therefore, there is a significant need for efficient and rapid automated approaches to predict changes in PPIs. In recent years, there has been a significant increase in applying deep learning techniques to predict changes in binding affinity between the original protein complex and its mutant variants. Particularly, the adoption of graph neural networks (GNNs) has gained prominence for their ability to learn representations of protein–protein complexes. However, the conventional GNNs have mainly concentrated on capturing local features, often disregarding the interactions among distant elements that hold potential important information. In this study, we have developed a transformer-based graph neural network to extract features of the mutant segment from the three-dimensional structure of protein–protein complexes. By embracing both local and global features, the approach ensures a more comprehensive understanding of the intricate relationships, thus promising more accurate predictions of binding affinity changes. To enhance the representation capability of protein features, we incorporate a large-scale pre-trained protein language model into our approach and employ the global protein feature it provides. The proposed model is shown to be able to predict the mutation changes in binding affinity with a root mean square error of 1.10 and a Pearson correlation coefficient of near 0.71, as demonstrated by performance on test and validation cases. Our experiments on all five datasets, including both single mutant and multiple mutant cases, demonstrate that our model outperforms four state-of-the-art baseline methods, and the efficacy was subjected to comprehensive experimental evaluation. Our study introduces a transformer-based graph neural network approach to accurately predict changes in protein–protein interactions (PPIs). By integrating local and global features and leveraging pretrained protein language models, our model outperforms state-of-the-art methods across diverse datasets. The results of this study can provide new views for studying immune responses and disease etiology related to protein mutations. Furthermore, this approach may contribute to other biological or biochemical studies related to PPIs.

Scientific contribution Our scientific contribution lies in the development of a novel transformer-based graph neural network tailored to predict changes in protein–protein interactions (PPIs) with excellent accuracy. By seamlessly integrating both local and global features extracted from the three-dimensional structure of protein–protein complexes, and leveraging the rich representations provided by pretrained protein language models, our approach surpasses existing methods across diverse datasets. Our findings may offer novel insights for the understanding of complex disease etiology associated with protein mutations. The novel tool can be applicable to various biological and biochemical investigations involving protein mutations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

14.10

自引率

7.00%

发文量

审稿时长

3 months

期刊介绍： Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.