{"title":"基于局部编码PSSM和多特征融合的高效特征提取技术预测蛋白质-蛋白质相互作用","authors":"Ji-Yong An, Yong Zhou, Yu-Jun Zhao, Zi-Ji Yan","doi":"10.1177/1176934319879920","DOIUrl":null,"url":null,"abstract":"Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM.","PeriodicalId":136690,"journal":{"name":"Evolutionary Bioinformatics Online","volume":"663 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"An Efficient Feature Extraction Technique Based on Local Coding PSSM and Multifeatures Fusion for Predicting Protein-Protein Interactions\",\"authors\":\"Ji-Yong An, Yong Zhou, Yu-Jun Zhao, Zi-Ji Yan\",\"doi\":\"10.1177/1176934319879920\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM.\",\"PeriodicalId\":136690,\"journal\":{\"name\":\"Evolutionary Bioinformatics Online\",\"volume\":\"663 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Evolutionary Bioinformatics Online\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/1176934319879920\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Bioinformatics Online","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/1176934319879920","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
摘要
背景:越来越多的证据表明,蛋白质-蛋白质相互作用(PPIs)在细胞结构和功能组织的各个方面起着重要作用。因此,继续发现潜在的PPIs是生物医学领域的一个重要课题。尽管使用机器学习方法的各种特征提取方法增强了ppi的预测。通过开发新颖有效的特征提取方法和分类器方法来识别ppi,仍然有改进的空间。方法:在本研究中,我们提出了一种基于序列的特征提取方法LCPSSMMF,该方法将局部编码位置特异性评分矩阵(PSSM)与多特征融合相结合。首先,我们采用一种新的基于PSSM的局部编码方法构建新的PSSM (CPSSM);该方法的优点是结合了全局和局部特征提取,可以考虑氨基酸序列连续区和不连续区残基之间的相互作用。其次,我们采用了2种不同的特征提取方法(Local Average Group [LAG]和Bigram Probability [BP]),利用嵌入在CPSSM矩阵中的进化信息捕获多个关键特征信息。最后,采用多特征融合方法获取特征向量。结果:为了评估所提出的特征提取方法的性能,我们采用支持向量机(SVM)作为预测分类器,并将该方法应用于酵母和人类PPI数据集。LCPSSMMF在酵母和人类数据集上的预测准确率分别为93.43%和90.41%。此外,我们还使用相同的SVM分类器,将所提出的方法与之前基于序列的方法在酵母数据集上进行了比较。实验结果表明,LCPSSMMF的性能明显优于其他几种最先进的方法。实验证明,LCPSSMMF方法比几乎所有以前的方法都能捕获更多的局部和全局歧视性信息,并且可以很好地识别ppi。为了促进未来蛋白质组学研究的广泛研究,我们开发了LCPSSMMFSVM服务器,该服务器可在http://219.219.62.123:8888/LCPSSMMFSVM免费供学术使用。
An Efficient Feature Extraction Technique Based on Local Coding PSSM and Multifeatures Fusion for Predicting Protein-Protein Interactions
Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM.