CAA-PPI: A Computational Feature Design to Predict Protein–Protein Interactions Using Different Encoding Strategies

IF 5 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI (Basel, Switzerland) Pub Date : 2023-04-28 DOI:10.3390/ai4020020

Bhawna Mewara, Gunjan Sahni, Soniya Lalwani, Rajesh Kumar

{"title":"CAA-PPI: A Computational Feature Design to Predict Protein–Protein Interactions Using Different Encoding Strategies","authors":"Bhawna Mewara, Gunjan Sahni, Soniya Lalwani, Rajesh Kumar","doi":"10.3390/ai4020020","DOIUrl":null,"url":null,"abstract":"Protein–protein interactions (PPIs) are involved in an extensive variety of biological procedures, including cell-to-cell interactions, and metabolic and developmental control. PPIs are becoming one of the most important aims of system biology. PPIs act as a fundamental part in predicting the protein function of the target protein and the drug ability of molecules. An abundance of work has been performed to develop methods to computationally predict PPIs as this supplements laboratory trials and offers a cost-effective way of predicting the most likely set of interactions at the entire proteome scale. This article presents an innovative feature representation method (CAA-PPI) to extract features from protein sequences using two different encoding strategies followed by an ensemble learning method. The random forest methodwas used as a classifier for PPI prediction. CAA-PPI considers the role of the trigram and bond of a given amino acid with its nearby ones. The proposed PPI model achieved more than a 98% prediction accuracy with one encoding scheme and more than a 95% prediction accuracy with another encoding scheme for the two diverse PPI datasets, i.e., H. pylori and Yeast. Further, investigations were performed to compare the CAA-PPI approach with existing sequence-based methods and revealed the proficiency of the proposed method with both encoding strategies. To further assess the practical prediction competence, a blind test was implemented on five other species’ datasets independent of the training set, and the obtained results ascertained the productivity of CAA-PPI with both encoding schemes.","PeriodicalId":93633,"journal":{"name":"AI (Basel, Switzerland)","volume":"88 1","pages":"0"},"PeriodicalIF":5.0000,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI (Basel, Switzerland)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/ai4020020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Protein–protein interactions (PPIs) are involved in an extensive variety of biological procedures, including cell-to-cell interactions, and metabolic and developmental control. PPIs are becoming one of the most important aims of system biology. PPIs act as a fundamental part in predicting the protein function of the target protein and the drug ability of molecules. An abundance of work has been performed to develop methods to computationally predict PPIs as this supplements laboratory trials and offers a cost-effective way of predicting the most likely set of interactions at the entire proteome scale. This article presents an innovative feature representation method (CAA-PPI) to extract features from protein sequences using two different encoding strategies followed by an ensemble learning method. The random forest methodwas used as a classifier for PPI prediction. CAA-PPI considers the role of the trigram and bond of a given amino acid with its nearby ones. The proposed PPI model achieved more than a 98% prediction accuracy with one encoding scheme and more than a 95% prediction accuracy with another encoding scheme for the two diverse PPI datasets, i.e., H. pylori and Yeast. Further, investigations were performed to compare the CAA-PPI approach with existing sequence-based methods and revealed the proficiency of the proposed method with both encoding strategies. To further assess the practical prediction competence, a blind test was implemented on five other species’ datasets independent of the training set, and the obtained results ascertained the productivity of CAA-PPI with both encoding schemes.

查看原文本刊更多论文

CAA-PPI:使用不同编码策略预测蛋白质-蛋白质相互作用的计算特征设计

蛋白质-蛋白质相互作用(PPIs)涉及广泛的生物过程，包括细胞间相互作用，代谢和发育控制。PPIs正成为系统生物学最重要的目标之一。PPIs在预测靶蛋白的蛋白质功能和分子的药物能力方面起着重要的作用。已经进行了大量的工作来开发计算预测ppi的方法，作为实验室试验的补充，并提供了一种在整个蛋白质组尺度上预测最可能的相互作用集的经济有效的方法。本文提出了一种创新的特征表示方法(CAA-PPI)，该方法使用两种不同的编码策略和集成学习方法从蛋白质序列中提取特征。采用随机森林方法作为PPI预测的分类器。CAA-PPI考虑的是给定氨基酸与邻近氨基酸的三元键和键的作用。对于H. pylori和Yeast两种不同PPI数据集，所提出的PPI模型在一种编码方案下的预测准确率达到98%以上，在另一种编码方案下的预测准确率达到95%以上。此外，研究人员还将CAA-PPI方法与现有的基于序列的方法进行了比较，并揭示了所提出的方法对两种编码策略的熟练程度。为了进一步评估实际预测能力，对另外5个独立于训练集的物种数据集进行了盲测，得到的结果确定了两种编码方案下CAA-PPI的生产力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊