{"title":"SCPSSMpred: A General Sequence-based Method for Ligand-binding Site Prediction","authors":"Chun Fang, T. Noguchi, H. Yamana","doi":"10.2197/IPSJTBIO.6.35","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior *1.","PeriodicalId":38959,"journal":{"name":"IPSJ Transactions on Bioinformatics","volume":"6 1","pages":"35-42"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2197/IPSJTBIO.6.35","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSJ Transactions on Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/IPSJTBIO.6.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we propose a novel method, named SCPSSMpred (Smoothed and Condensed PSSM based prediction), which uses a simplified position-specific scoring matrix (PSSM) for predicting ligand-binding sites. Although the simplified PSSM has only ten dimensions, it combines abundant features, such as amino acid arrangement, information of neighboring residues, physicochemical properties, and evolutionary information. Our method employs no predicted results from other classifiers as input, i.e., all features used in this method are extracted from the sequences only. Three ligands (FAD, NAD and ATP) were used to verify the versatility of our method, and three alternative traditional methods were also analyzed for comparison. All the methods were tested at both the residue level and the protein sequence level. Experimental results showed that the SCPSSMpred method achieved the best performance besides reducing 50% of redundant features in PSSM. In addition, it showed a remarkable adaptability in dealing with unbalanced data compared to other methods when tested on the protein sequence level. This study not only demonstrates the importance of reducing redundant features in PSSM, but also identifies sequence-derived hallmarks of ligand-binding sites, such that both the arrangements and physicochemical properties of neighboring residues significantly impact ligand-binding behavior *1.
在本文中,我们提出了一种新的方法,命名为SCPSSMpred (Smoothed and Condensed PSSM based prediction),它使用一个简化的位置特异性评分矩阵(PSSM)来预测配体结合位点。虽然简化后的PSSM只有十个维度,但它结合了丰富的特征,如氨基酸排列、邻近残基信息、物理化学性质和进化信息等。我们的方法不使用其他分类器的预测结果作为输入,即该方法中使用的所有特征仅从序列中提取。用三种配体(FAD, NAD和ATP)验证了我们方法的通用性,并对三种替代的传统方法进行了比较分析。所有方法均在残基水平和蛋白序列水平上进行了试验。实验结果表明,SCPSSMpred方法除将PSSM中的冗余特征减少50%外,还取得了最佳性能。此外,在蛋白质序列水平上,与其他方法相比,该方法在处理不平衡数据方面表现出了显著的适应性。这项研究不仅证明了减少PSSM中冗余特征的重要性,而且还确定了配体结合位点的序列衍生标志,使得邻近残基的排列和物理化学性质显著影响配体结合行为*1。