利用支持向量机设计嗜盐和非嗜盐蛋白的预测因子

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) Pub Date : 2013-04-16 DOI:10.1109/CIBCB.2013.6595414

Hui-Ling Huang, Y. S. Srinivasulu, Phasit Charoenkwan, Hua-Chin Lee, Shinn-Ying Ho

{"title":"利用支持向量机设计嗜盐和非嗜盐蛋白的预测因子","authors":"Hui-Ling Huang, Y. S. Srinivasulu, Phasit Charoenkwan, Hua-Chin Lee, Shinn-Ying Ho","doi":"10.1109/CIBCB.2013.6595414","DOIUrl":null,"url":null,"abstract":"Finding the molecular features causes the halophilicity in the halostable organisms is helpful to understand the halophilic adaption. In this study, we proposed a prediction method for halophilic proteins by using a machine learning method. The stages of this study are six-fold. First, we establish a non-redundant dataset of the halophilic proteins, collected from NCBI, Uniprotkb and EMBL-EBI databases. The dataset consists of 245 positive and negative proteins with sequence identity <;25%. Second, the protein sequences are represented by three types of feature vector sets which include amino acid composition, dipeptide composition, and physicochemical properties. Third, we propose three classifiers based on support vector machine (SVM) to classify the halophilic proteins and non-halophilic proteins. Fourth, the independent test accuracies of the three efficient classifiers are larger than 83%. Fifth, an inheritable biobjective combinatory genetic algorithm is utilized to select a set of 11 physicochemical properties (PCPs). Sixth, these abundant amino acids, high different dipeptides (amino acid pair) and 11 informative PCPs can support to analyze the halophilic and non-halophilic proteins.","PeriodicalId":350407,"journal":{"name":"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Designing predictors of halophilic and non-halophilic proteins using support vector machines\",\"authors\":\"Hui-Ling Huang, Y. S. Srinivasulu, Phasit Charoenkwan, Hua-Chin Lee, Shinn-Ying Ho\",\"doi\":\"10.1109/CIBCB.2013.6595414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding the molecular features causes the halophilicity in the halostable organisms is helpful to understand the halophilic adaption. In this study, we proposed a prediction method for halophilic proteins by using a machine learning method. The stages of this study are six-fold. First, we establish a non-redundant dataset of the halophilic proteins, collected from NCBI, Uniprotkb and EMBL-EBI databases. The dataset consists of 245 positive and negative proteins with sequence identity <;25%. Second, the protein sequences are represented by three types of feature vector sets which include amino acid composition, dipeptide composition, and physicochemical properties. Third, we propose three classifiers based on support vector machine (SVM) to classify the halophilic proteins and non-halophilic proteins. Fourth, the independent test accuracies of the three efficient classifiers are larger than 83%. Fifth, an inheritable biobjective combinatory genetic algorithm is utilized to select a set of 11 physicochemical properties (PCPs). Sixth, these abundant amino acids, high different dipeptides (amino acid pair) and 11 informative PCPs can support to analyze the halophilic and non-halophilic proteins.\",\"PeriodicalId\":350407,\"journal\":{\"name\":\"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIBCB.2013.6595414\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2013.6595414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

发现嗜盐生物嗜盐性的分子特征有助于了解嗜盐生物的适应性。在这项研究中，我们提出了一种利用机器学习方法预测嗜盐蛋白的方法。这项研究分为6个阶段。首先，我们从NCBI、Uniprotkb和EMBL-EBI数据库中收集了一个非冗余的嗜盐蛋白数据集。该数据集由245个序列同源性< 25%的阳性和阴性蛋白组成。其次，用氨基酸组成、二肽组成和理化性质三种特征向量集来表示蛋白质序列。第三，提出了基于支持向量机(SVM)的三种分类器对嗜盐蛋白和非嗜盐蛋白进行分类。四是三种高效分类器的独立测试准确率均大于83%。第五，利用可遗传的双目标组合遗传算法选择11个理化性质。第六，这些丰富的氨基酸、高差异的二肽(氨基酸对)和11个信息丰富的pcp为分析嗜盐和非嗜盐蛋白提供了支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Designing predictors of halophilic and non-halophilic proteins using support vector machines

Finding the molecular features causes the halophilicity in the halostable organisms is helpful to understand the halophilic adaption. In this study, we proposed a prediction method for halophilic proteins by using a machine learning method. The stages of this study are six-fold. First, we establish a non-redundant dataset of the halophilic proteins, collected from NCBI, Uniprotkb and EMBL-EBI databases. The dataset consists of 245 positive and negative proteins with sequence identity <;25%. Second, the protein sequences are represented by three types of feature vector sets which include amino acid composition, dipeptide composition, and physicochemical properties. Third, we propose three classifiers based on support vector machine (SVM) to classify the halophilic proteins and non-halophilic proteins. Fourth, the independent test accuracies of the three efficient classifiers are larger than 83%. Fifth, an inheritable biobjective combinatory genetic algorithm is utilized to select a set of 11 physicochemical properties (PCPs). Sixth, these abundant amino acids, high different dipeptides (amino acid pair) and 11 informative PCPs can support to analyze the halophilic and non-halophilic proteins.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

自引率

0.00%

发文量