{"title":"蛋白质结构预测的有效框架","authors":"Nagamma Patil, Durga Toshniwal, K. Garg","doi":"10.1504/IJFIPM.2012.050426","DOIUrl":null,"url":null,"abstract":"This paper presents a computational system to predict protein structure using N–grams and a wrapper feature selection framework (the N–gram is a subsequence composed of N characters, extracted from a larger sequence). N–gram features are extracted from a dataset consisting of 277 domains: 70 all–α domains, 61 all–β domains, 81 α/β domains and 65 α + β domains. A wrapper feature selection system, GA–SVM, is applied to obtain an optimised feature set. Using the optimised 3070–feature subset, a classifier model is trained and tested in the Support Vector Machine (SVM) learning system. This model achieves an overall accuracy of 88.09%, evaluated by a 10–fold cross–validation test. This value is 4.7% higher than the one using the initial 6,414 features. Experimental results also illustrate that employing a feature subset selection, by using the proposed GA–SVM wrapper approach, has enhanced classification accuracy in comparison to other GA–based wrapper approaches and existing protein sequence encoding methods.","PeriodicalId":216126,"journal":{"name":"Int. J. Funct. Informatics Pers. Medicine","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Effective framework for protein structure prediction\",\"authors\":\"Nagamma Patil, Durga Toshniwal, K. Garg\",\"doi\":\"10.1504/IJFIPM.2012.050426\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a computational system to predict protein structure using N–grams and a wrapper feature selection framework (the N–gram is a subsequence composed of N characters, extracted from a larger sequence). N–gram features are extracted from a dataset consisting of 277 domains: 70 all–α domains, 61 all–β domains, 81 α/β domains and 65 α + β domains. A wrapper feature selection system, GA–SVM, is applied to obtain an optimised feature set. Using the optimised 3070–feature subset, a classifier model is trained and tested in the Support Vector Machine (SVM) learning system. This model achieves an overall accuracy of 88.09%, evaluated by a 10–fold cross–validation test. This value is 4.7% higher than the one using the initial 6,414 features. Experimental results also illustrate that employing a feature subset selection, by using the proposed GA–SVM wrapper approach, has enhanced classification accuracy in comparison to other GA–based wrapper approaches and existing protein sequence encoding methods.\",\"PeriodicalId\":216126,\"journal\":{\"name\":\"Int. J. Funct. Informatics Pers. Medicine\",\"volume\":\"92 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Funct. Informatics Pers. Medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJFIPM.2012.050426\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Funct. Informatics Pers. Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJFIPM.2012.050426","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Effective framework for protein structure prediction
This paper presents a computational system to predict protein structure using N–grams and a wrapper feature selection framework (the N–gram is a subsequence composed of N characters, extracted from a larger sequence). N–gram features are extracted from a dataset consisting of 277 domains: 70 all–α domains, 61 all–β domains, 81 α/β domains and 65 α + β domains. A wrapper feature selection system, GA–SVM, is applied to obtain an optimised feature set. Using the optimised 3070–feature subset, a classifier model is trained and tested in the Support Vector Machine (SVM) learning system. This model achieves an overall accuracy of 88.09%, evaluated by a 10–fold cross–validation test. This value is 4.7% higher than the one using the initial 6,414 features. Experimental results also illustrate that employing a feature subset selection, by using the proposed GA–SVM wrapper approach, has enhanced classification accuracy in comparison to other GA–based wrapper approaches and existing protein sequence encoding methods.