MIPPIS: protein-protein interaction site prediction network with multi-information fusion.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2024-11-04 DOI:10.1186/s12859-024-05964-7

Shuang Wang, Kaiyu Dong, Dingming Liang, Yunjing Zhang, Xue Li, Tao Song

{"title":"MIPPIS: protein-protein interaction site prediction network with multi-information fusion.","authors":"Shuang Wang, Kaiyu Dong, Dingming Liang, Yunjing Zhang, Xue Li, Tao Song","doi":"10.1186/s12859-024-05964-7","DOIUrl":null,"url":null,"abstract":"Background: The prediction of protein-protein interaction sites plays a crucial role in biochemical processes. Investigating the interaction between viruses and receptor proteins through biological techniques aids in understanding disease mechanisms and guides the development of corresponding drugs. While various methods have been proposed in the past, they often suffer from drawbacks such as long processing times, high costs, and low accuracy.Results: Addressing these challenges, we propose a novel protein-protein interaction site prediction network based on multi-information fusion. In our approach, the initial amino acid features are depicted by the position-specific scoring matrix, hidden Markov model, dictionary of protein secondary structure, and one-hot encoding. Simultaneously, we adopt a multi-channel approach to extract deep-level amino acids features from different perspectives. The graph convolutional network channel effectively extracts spatial structural information. The bidirectional long short-term memory channel treats the amino acid sequence as natural language, capturing the protein's primary structure information. The ProtT5 protein large language model channel outputs a more comprehensive amino acid embedding representation, providing a robust complement to the two aforementioned channels. Finally, the obtained amino acid features are fed into the prediction layer for the final prediction.Conclusion: Compared with six protein structure-based methods and six protein sequence-based methods, our model achieves optimal performance across evaluation metrics, including accuracy, precision, F1, Matthews correlation coefficient, and area under the precision recall curve, which demonstrates the superiority of our model.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"345"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11536593/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-024-05964-7","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The prediction of protein-protein interaction sites plays a crucial role in biochemical processes. Investigating the interaction between viruses and receptor proteins through biological techniques aids in understanding disease mechanisms and guides the development of corresponding drugs. While various methods have been proposed in the past, they often suffer from drawbacks such as long processing times, high costs, and low accuracy.

Results: Addressing these challenges, we propose a novel protein-protein interaction site prediction network based on multi-information fusion. In our approach, the initial amino acid features are depicted by the position-specific scoring matrix, hidden Markov model, dictionary of protein secondary structure, and one-hot encoding. Simultaneously, we adopt a multi-channel approach to extract deep-level amino acids features from different perspectives. The graph convolutional network channel effectively extracts spatial structural information. The bidirectional long short-term memory channel treats the amino acid sequence as natural language, capturing the protein's primary structure information. The ProtT5 protein large language model channel outputs a more comprehensive amino acid embedding representation, providing a robust complement to the two aforementioned channels. Finally, the obtained amino acid features are fed into the prediction layer for the final prediction.

Conclusion: Compared with six protein structure-based methods and six protein sequence-based methods, our model achieves optimal performance across evaluation metrics, including accuracy, precision, F₁, Matthews correlation coefficient, and area under the precision recall curve, which demonstrates the superiority of our model.

查看原文本刊更多论文

MIPPIS：多信息融合的蛋白质-蛋白质相互作用位点预测网络。

背景：预测蛋白质与蛋白质之间的相互作用位点在生化过程中起着至关重要的作用。通过生物技术研究病毒与受体蛋白之间的相互作用有助于了解疾病机理并指导相应药物的开发。过去曾提出过多种方法，但往往存在处理时间长、成本高、准确性低等缺点：针对这些挑战，我们提出了一种基于多信息融合的新型蛋白质-蛋白质相互作用位点预测网络。在我们的方法中，初始氨基酸特征由特定位置评分矩阵、隐马尔可夫模型、蛋白质二级结构字典和单次编码来描述。同时，我们采用多通道方法从不同角度提取深层次氨基酸特征。图卷积网络通道能有效提取空间结构信息。双向长短期记忆通道将氨基酸序列视为自然语言，捕捉蛋白质的主要结构信息。ProtT5 蛋白质大语言模型通道输出更全面的氨基酸嵌入表示，为上述两个通道提供了稳健的补充。最后，将获得的氨基酸特征输入预测层进行最终预测：结论：与六种基于蛋白质结构的方法和六种基于蛋白质序列的方法相比，我们的模型在准确率、精确度、F1、马太相关系数和精确召回曲线下面积等评价指标上都达到了最佳性能，这证明了我们模型的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.