RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human

IF 7.7 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xin Liu;Yaping Lu;Liang Wang;Wei Geng;Xinyi Shi;Xiao Zhang
{"title":"RF-PSSM: A Combination of Rotation Forest Algorithm and Position-Specific Scoring Matrix for Improved Prediction of Protein-Protein Interactions Between Hepatitis C Virus and Human","authors":"Xin Liu;Yaping Lu;Liang Wang;Wei Geng;Xinyi Shi;Xiao Zhang","doi":"10.26599/BDMA.2022.9020031","DOIUrl":null,"url":null,"abstract":"The identification of hepatitis C virus (HCV) virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets. An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases, facilitating studies based on computational methods. In this study, we proposed a new computational approach, rotation forest position-specific scoring matrix (RF-PSSM), to predict the interactions among HCV and human proteins. In particular, PSSM was used to characterize each protein, two-dimensional principal component analysis (2DPCA) was then adopted for feature extraction of PSSM. Finally, rotation forest (RF) was used to implement classification. The results of various ablation experiments show that on independent datasets, the accuracy and area under curve (AUC) value of RF-PSSM can reach 93.74\n<sup>%</sup>\n and 94.29%, respectively, outperforming almost all cutting-edge research. In addition, we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1, which can provide theoretical guidance for future experimental studies.","PeriodicalId":52355,"journal":{"name":"Big Data Mining and Analytics","volume":"6 1","pages":"21-31"},"PeriodicalIF":7.7000,"publicationDate":"2022-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8254253/9962810/09962955.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Big Data Mining and Analytics","FirstCategoryId":"1093","ListUrlMain":"https://ieeexplore.ieee.org/document/9962955/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 1

Abstract

The identification of hepatitis C virus (HCV) virus-human protein interactions will not only help us understand the molecular mechanisms of related diseases but also be conductive to discovering new drug targets. An increasing number of clinically and experimentally validated interactions between HCV and human proteins have been documented in public databases, facilitating studies based on computational methods. In this study, we proposed a new computational approach, rotation forest position-specific scoring matrix (RF-PSSM), to predict the interactions among HCV and human proteins. In particular, PSSM was used to characterize each protein, two-dimensional principal component analysis (2DPCA) was then adopted for feature extraction of PSSM. Finally, rotation forest (RF) was used to implement classification. The results of various ablation experiments show that on independent datasets, the accuracy and area under curve (AUC) value of RF-PSSM can reach 93.74 % and 94.29%, respectively, outperforming almost all cutting-edge research. In addition, we used RF-PSSM to predict 9 human proteins that may interact with HCV protein E1, which can provide theoretical guidance for future experimental studies.
RF-PSSM:旋转森林算法和位置特异性评分矩阵的结合改进了丙型肝炎病毒与人之间蛋白质-蛋白质相互作用的预测
丙型肝炎病毒(HCV)与人蛋白相互作用的鉴定不仅有助于我们了解相关疾病的分子机制,而且有助于发现新的药物靶点。公共数据库中记录了越来越多的临床和实验验证的丙型肝炎病毒和人类蛋白质之间的相互作用,促进了基于计算方法的研究。在这项研究中,我们提出了一种新的计算方法,即旋转森林位置特异性评分矩阵(RF-PSSM),来预测HCV和人类蛋白质之间的相互作用。特别地,使用PSSM对每种蛋白质进行表征,然后采用二维主成分分析(2DPCA)对PSSM进行特征提取。最后,利用轮作森林(RF)进行分类。各种消融实验的结果表明,在独立的数据集上,RF-PSSM的准确率和曲线下面积(AUC)值分别可达93.74%和94.29%,优于几乎所有的前沿研究。此外,我们使用RF-PSSM预测了9种可能与HCV蛋白E1相互作用的人类蛋白,这可以为未来的实验研究提供理论指导。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Big Data Mining and Analytics
Big Data Mining and Analytics Computer Science-Computer Science Applications
CiteScore
20.90
自引率
2.20%
发文量
84
期刊介绍: Big Data Mining and Analytics, a publication by Tsinghua University Press, presents groundbreaking research in the field of big data research and its applications. This comprehensive book delves into the exploration and analysis of vast amounts of data from diverse sources to uncover hidden patterns, correlations, insights, and knowledge. Featuring the latest developments, research issues, and solutions, this book offers valuable insights into the world of big data. It provides a deep understanding of data mining techniques, data analytics, and their practical applications. Big Data Mining and Analytics has gained significant recognition and is indexed and abstracted in esteemed platforms such as ESCI, EI, Scopus, DBLP Computer Science, Google Scholar, INSPEC, CSCD, DOAJ, CNKI, and more. With its wealth of information and its ability to transform the way we perceive and utilize data, this book is a must-read for researchers, professionals, and anyone interested in the field of big data analytics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信