{"title":"Protein secondary structure prediction through a novel framework of secondary structure transition sites and new encoding schemes","authors":"Masood Zamani, S. C. Kremer","doi":"10.1109/CIBCB.2016.7758118","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an ab initio two-stage protein secondary structure (PSS) prediction model through a novel framework of PSS transition site prediction by using Artificial Neural Networks (ANNs) and Genetic Programming (GP). In the proposed classifier, protein sequences are encoded by new amino acid encoding schemes derived from genetic Codon mappings, Clustering and Information theory. In the first stage, sequence segments are mapped to regions in the Ramachandran map (2D-plot), and weight scores are computed by using statistical information derived from clusters. In addition, score vectors are constructed for the mapped regions using the weight scores and PSS transition sites. The score vectors have fewer dimensions compared to those of commonly used encoding schemes and protein profile. In the second stage, a two-tier classifier is employed based on an ANN and a GP method. The performance of the two-stage classifier is compared to the state-of-the-art cascaded Machine Learning methods which commonly employ ANNs. The prediction method is examined with the latest dataset of nonhomologous protein sequences, PISCES [1]. The experimental results and statistical analyses indicate a significantly higher distribution of Q3 scores, approximately 7% with p-value <; 0.001, in comparison to that of cascaded ANN architectures. PSS transition sites are valuable information about the topological property of protein sequences and incorporating the information improves the overall performance of the PSS prediction model.","PeriodicalId":368740,"journal":{"name":"2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2016.7758118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In this paper, we propose an ab initio two-stage protein secondary structure (PSS) prediction model through a novel framework of PSS transition site prediction by using Artificial Neural Networks (ANNs) and Genetic Programming (GP). In the proposed classifier, protein sequences are encoded by new amino acid encoding schemes derived from genetic Codon mappings, Clustering and Information theory. In the first stage, sequence segments are mapped to regions in the Ramachandran map (2D-plot), and weight scores are computed by using statistical information derived from clusters. In addition, score vectors are constructed for the mapped regions using the weight scores and PSS transition sites. The score vectors have fewer dimensions compared to those of commonly used encoding schemes and protein profile. In the second stage, a two-tier classifier is employed based on an ANN and a GP method. The performance of the two-stage classifier is compared to the state-of-the-art cascaded Machine Learning methods which commonly employ ANNs. The prediction method is examined with the latest dataset of nonhomologous protein sequences, PISCES [1]. The experimental results and statistical analyses indicate a significantly higher distribution of Q3 scores, approximately 7% with p-value <; 0.001, in comparison to that of cascaded ANN architectures. PSS transition sites are valuable information about the topological property of protein sequences and incorporating the information improves the overall performance of the PSS prediction model.