M. Iqbal, I. Faye, A. M. Md Said, Brahim Belhaouari Samir
{"title":"一种高效的蛋白质序列分类计算智能技术","authors":"M. Iqbal, I. Faye, A. M. Md Said, Brahim Belhaouari Samir","doi":"10.1109/ICCOINS.2014.6868352","DOIUrl":null,"url":null,"abstract":"Many artificial intelligence techniques have been developed to process the constantly increasing volume of data to extract meaningful information from it. The accurate annotation of the unknown protein using the classification of the protein sequence into an existing superfamily is considered a critical and challenging task in bioinformatics and computational biology. This classification would be helpful in the analysis and modeling of unknown protein to determine their structure and function. In this paper, a frequency-based feature encoding technique has been used in the proposed framework to represent amino acids of a protein's primary sequence. The technique has considered the occurrence frequency of each amino acid in a sequence. Popular classification algorithms such as decision tree, naïve Bayes, neural network, random forest and support vector machine have been employed to evaluate the effectiveness of the encoding method utilized in the proposed framework. Results have indicated that the decision tree classifier significantly shows better results in terms of classification accuracy, specificity, sensitivity, F-measure, etc. The classification accuracy of 88.7% was achieved over the Yeast protein sequence data taken from the well-known UniProtKB database.","PeriodicalId":368100,"journal":{"name":"2014 International Conference on Computer and Information Sciences (ICCOINS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An efficient computational intelligence technique for classification of protein sequences\",\"authors\":\"M. Iqbal, I. Faye, A. M. Md Said, Brahim Belhaouari Samir\",\"doi\":\"10.1109/ICCOINS.2014.6868352\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many artificial intelligence techniques have been developed to process the constantly increasing volume of data to extract meaningful information from it. The accurate annotation of the unknown protein using the classification of the protein sequence into an existing superfamily is considered a critical and challenging task in bioinformatics and computational biology. This classification would be helpful in the analysis and modeling of unknown protein to determine their structure and function. In this paper, a frequency-based feature encoding technique has been used in the proposed framework to represent amino acids of a protein's primary sequence. The technique has considered the occurrence frequency of each amino acid in a sequence. Popular classification algorithms such as decision tree, naïve Bayes, neural network, random forest and support vector machine have been employed to evaluate the effectiveness of the encoding method utilized in the proposed framework. Results have indicated that the decision tree classifier significantly shows better results in terms of classification accuracy, specificity, sensitivity, F-measure, etc. The classification accuracy of 88.7% was achieved over the Yeast protein sequence data taken from the well-known UniProtKB database.\",\"PeriodicalId\":368100,\"journal\":{\"name\":\"2014 International Conference on Computer and Information Sciences (ICCOINS)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Computer and Information Sciences (ICCOINS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCOINS.2014.6868352\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Computer and Information Sciences (ICCOINS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCOINS.2014.6868352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An efficient computational intelligence technique for classification of protein sequences
Many artificial intelligence techniques have been developed to process the constantly increasing volume of data to extract meaningful information from it. The accurate annotation of the unknown protein using the classification of the protein sequence into an existing superfamily is considered a critical and challenging task in bioinformatics and computational biology. This classification would be helpful in the analysis and modeling of unknown protein to determine their structure and function. In this paper, a frequency-based feature encoding technique has been used in the proposed framework to represent amino acids of a protein's primary sequence. The technique has considered the occurrence frequency of each amino acid in a sequence. Popular classification algorithms such as decision tree, naïve Bayes, neural network, random forest and support vector machine have been employed to evaluate the effectiveness of the encoding method utilized in the proposed framework. Results have indicated that the decision tree classifier significantly shows better results in terms of classification accuracy, specificity, sensitivity, F-measure, etc. The classification accuracy of 88.7% was achieved over the Yeast protein sequence data taken from the well-known UniProtKB database.