{"title":"引入一种新的结构特征来预测蛋白质-蛋白质相互作用位点。","authors":"Lingwei Lai, Jing Geng, Haochen Duan, Siyuan Chen, Lvwen Huang, Jiantao Yu","doi":"10.1089/cmb.2024.0804","DOIUrl":null,"url":null,"abstract":"<p><p>Interaction between proteins often depends on the sequence features and structure features of proteins. Both of these features are helpful for machine learning methods to predict (protein-protein interaction) PPI sites. In this study, we introduced a new structure feature: concave-convex feature on the protein surface, which was computed by the structural data of proteins in Protein Data Bank database. And then, a prediction model combining protein sequence features and structure features was constructed, named SSPPI_Ensemble (Sequence and Structure geometric feature-based PPI site prediction). Three sequence features, i.e., PSSMs (Position-Specific Scoring Matrices), HMM (Hidden Markov Models) and raw protein sequence, were used. The Dictionary of Secondary Structure in Proteins and the concave-convex feature were used as the structure feature. Compared with the other prediction methods, our method has achieved better performance or showed the obvious advantages on the same test datasets, confirming the proposed concave-convex feature is useful in predicting PPI sites.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"520-536"},"PeriodicalIF":1.4000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A New Structure Feature Introduced to Predict Protein-Protein Interaction Sites.\",\"authors\":\"Lingwei Lai, Jing Geng, Haochen Duan, Siyuan Chen, Lvwen Huang, Jiantao Yu\",\"doi\":\"10.1089/cmb.2024.0804\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Interaction between proteins often depends on the sequence features and structure features of proteins. Both of these features are helpful for machine learning methods to predict (protein-protein interaction) PPI sites. In this study, we introduced a new structure feature: concave-convex feature on the protein surface, which was computed by the structural data of proteins in Protein Data Bank database. And then, a prediction model combining protein sequence features and structure features was constructed, named SSPPI_Ensemble (Sequence and Structure geometric feature-based PPI site prediction). Three sequence features, i.e., PSSMs (Position-Specific Scoring Matrices), HMM (Hidden Markov Models) and raw protein sequence, were used. The Dictionary of Secondary Structure in Proteins and the concave-convex feature were used as the structure feature. Compared with the other prediction methods, our method has achieved better performance or showed the obvious advantages on the same test datasets, confirming the proposed concave-convex feature is useful in predicting PPI sites.</p>\",\"PeriodicalId\":15526,\"journal\":{\"name\":\"Journal of Computational Biology\",\"volume\":\" \",\"pages\":\"520-536\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1089/cmb.2024.0804\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/26 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2024.0804","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/26 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
蛋白质之间的相互作用往往取决于蛋白质的序列特征和结构特征。这两个特征都有助于机器学习方法预测(蛋白质-蛋白质相互作用)PPI位点。在这项研究中,我们引入了一种新的结构特征:蛋白质表面的凹凸特征,该特征是由蛋白质数据库中的蛋白质结构数据计算得出的。然后,构建了蛋白质序列特征与结构特征相结合的预测模型SSPPI_Ensemble (sequence And structure geometric feature based PPI site prediction)。使用了三个序列特征,即PSSMs (Position-Specific Scoring Matrices)、HMM (Hidden Markov Models)和原蛋白序列。利用蛋白质二级结构词典和凹凸特征作为结构特征。与其他预测方法相比,我们的方法在相同的测试数据集上取得了更好的性能或显示出明显的优势,证实了我们提出的凹凸特征在PPI位点预测方面的有用性。
A New Structure Feature Introduced to Predict Protein-Protein Interaction Sites.
Interaction between proteins often depends on the sequence features and structure features of proteins. Both of these features are helpful for machine learning methods to predict (protein-protein interaction) PPI sites. In this study, we introduced a new structure feature: concave-convex feature on the protein surface, which was computed by the structural data of proteins in Protein Data Bank database. And then, a prediction model combining protein sequence features and structure features was constructed, named SSPPI_Ensemble (Sequence and Structure geometric feature-based PPI site prediction). Three sequence features, i.e., PSSMs (Position-Specific Scoring Matrices), HMM (Hidden Markov Models) and raw protein sequence, were used. The Dictionary of Secondary Structure in Proteins and the concave-convex feature were used as the structure feature. Compared with the other prediction methods, our method has achieved better performance or showed the obvious advantages on the same test datasets, confirming the proposed concave-convex feature is useful in predicting PPI sites.
期刊介绍:
Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics.
Journal of Computational Biology coverage includes:
-Genomics
-Mathematical modeling and simulation
-Distributed and parallel biological computing
-Designing biological databases
-Pattern matching and pattern detection
-Linking disparate databases and data
-New tools for computational biology
-Relational and object-oriented database technology for bioinformatics
-Biological expert system design and use
-Reasoning by analogy, hypothesis formation, and testing by machine
-Management of biological databases