Protein secondary structure prediction through a novel framework of secondary structure transition sites and new encoding schemes

Masood Zamani, S. C. Kremer
{"title":"Protein secondary structure prediction through a novel framework of secondary structure transition sites and new encoding schemes","authors":"Masood Zamani, S. C. Kremer","doi":"10.1109/CIBCB.2016.7758118","DOIUrl":null,"url":null,"abstract":"In this paper, we propose an ab initio two-stage protein secondary structure (PSS) prediction model through a novel framework of PSS transition site prediction by using Artificial Neural Networks (ANNs) and Genetic Programming (GP). In the proposed classifier, protein sequences are encoded by new amino acid encoding schemes derived from genetic Codon mappings, Clustering and Information theory. In the first stage, sequence segments are mapped to regions in the Ramachandran map (2D-plot), and weight scores are computed by using statistical information derived from clusters. In addition, score vectors are constructed for the mapped regions using the weight scores and PSS transition sites. The score vectors have fewer dimensions compared to those of commonly used encoding schemes and protein profile. In the second stage, a two-tier classifier is employed based on an ANN and a GP method. The performance of the two-stage classifier is compared to the state-of-the-art cascaded Machine Learning methods which commonly employ ANNs. The prediction method is examined with the latest dataset of nonhomologous protein sequences, PISCES [1]. The experimental results and statistical analyses indicate a significantly higher distribution of Q3 scores, approximately 7% with p-value <; 0.001, in comparison to that of cascaded ANN architectures. PSS transition sites are valuable information about the topological property of protein sequences and incorporating the information improves the overall performance of the PSS prediction model.","PeriodicalId":368740,"journal":{"name":"2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2016.7758118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In this paper, we propose an ab initio two-stage protein secondary structure (PSS) prediction model through a novel framework of PSS transition site prediction by using Artificial Neural Networks (ANNs) and Genetic Programming (GP). In the proposed classifier, protein sequences are encoded by new amino acid encoding schemes derived from genetic Codon mappings, Clustering and Information theory. In the first stage, sequence segments are mapped to regions in the Ramachandran map (2D-plot), and weight scores are computed by using statistical information derived from clusters. In addition, score vectors are constructed for the mapped regions using the weight scores and PSS transition sites. The score vectors have fewer dimensions compared to those of commonly used encoding schemes and protein profile. In the second stage, a two-tier classifier is employed based on an ANN and a GP method. The performance of the two-stage classifier is compared to the state-of-the-art cascaded Machine Learning methods which commonly employ ANNs. The prediction method is examined with the latest dataset of nonhomologous protein sequences, PISCES [1]. The experimental results and statistical analyses indicate a significantly higher distribution of Q3 scores, approximately 7% with p-value <; 0.001, in comparison to that of cascaded ANN architectures. PSS transition sites are valuable information about the topological property of protein sequences and incorporating the information improves the overall performance of the PSS prediction model.
通过新的二级结构过渡位点框架和新的编码方案预测蛋白质二级结构
本文基于人工神经网络(ann)和遗传规划(GP)的PSS过渡位点预测框架,提出了一种从头开始的两阶段蛋白质二级结构(PSS)预测模型。在该分类器中,蛋白质序列采用基于遗传密码子映射、聚类和信息理论的新氨基酸编码方案进行编码。在第一阶段,将序列片段映射到Ramachandran地图(2D-plot)中的区域,并使用来自聚类的统计信息计算权重分数。此外,利用权重分数和PSS过渡点为映射区域构建得分向量。与常用的编码方案和蛋白质谱相比,得分向量的维数较少。在第二阶段,采用基于人工神经网络和GP方法的两层分类器。两阶段分类器的性能与通常使用人工神经网络的最先进的级联机器学习方法进行了比较。用最新的非同源蛋白序列数据集PISCES[1]对该预测方法进行了检验。实验结果和统计分析表明,Q3分数的分布明显较高,p值<;0.001,与级联ANN架构相比。PSS过渡位点是关于蛋白质序列拓扑特性的宝贵信息,结合这些信息可以提高PSS预测模型的整体性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信