iMulti-HumPhos: a multi-label classifier for identifying human phosphorylated proteins using multiple kernel learning based support vector machines†

IF 3.743 Q2 Biochemistry, Genetics and Molecular Biology
Md. Al Mehedi Hasan, Shamim Ahmad and Md. Khademul Islam Molla
{"title":"iMulti-HumPhos: a multi-label classifier for identifying human phosphorylated proteins using multiple kernel learning based support vector machines†","authors":"Md. Al Mehedi Hasan, Shamim Ahmad and Md. Khademul Islam Molla","doi":"10.1039/C7MB00180K","DOIUrl":null,"url":null,"abstract":"<p >Protein phosphorylation plays a potential role in regulating protein conformation and functions. As a result, identifying an uncharacterized protein sequence as a phosphorylated protein is a very meaningful problem and an urgent issue for both basic research and drug development. Although various types of computational methods have been developed to identify the phosphorylation sites for a recognized phosphorylated protein, very few computational methods have been developed to identify whether an uncharacterized protein can be phosphorylated or not. Therefore, there exists some scope for further improvement to characterize a protein as phosphorylated or not. Among all the residues of protein molecules, three types of amino acid residues, namely serine, threonine, and tyrosine, have been found to be susceptible to phosphorylation, which leads to the requirement of multi-label phosphorylated protein identification. Therefore, in this study, a novel computational tool termed iMulti-HumPhos has been developed to predict multi-label phosphorylated proteins by (1) extracting three different sets of features from protein sequences, (2) defining an individual kernel for each set of features and combining them into a single kernel using multiple kernel learning, and (3) constructing a multi-label predictor using a combination of support vector machines (SVMs) where each SVM has been trained with the combined kernel. In addition, we have balanced the effect of the skewed training dataset by the Different Error Costs method for the development of our system. The experimental results show that the iMulti-HumPhos predictor provides significantly better performance than the existing predictor Multi-iPPseEvo. A user-friendly web-server of iMulti-HumPhos is available at http://research.ru.ac.bd/iMulti-HumPhos/.</p>","PeriodicalId":90,"journal":{"name":"Molecular BioSystems","volume":" 8","pages":" 1608-1618"},"PeriodicalIF":3.7430,"publicationDate":"2017-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1039/C7MB00180K","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular BioSystems","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2017/mb/c7mb00180k","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 15

Abstract

Protein phosphorylation plays a potential role in regulating protein conformation and functions. As a result, identifying an uncharacterized protein sequence as a phosphorylated protein is a very meaningful problem and an urgent issue for both basic research and drug development. Although various types of computational methods have been developed to identify the phosphorylation sites for a recognized phosphorylated protein, very few computational methods have been developed to identify whether an uncharacterized protein can be phosphorylated or not. Therefore, there exists some scope for further improvement to characterize a protein as phosphorylated or not. Among all the residues of protein molecules, three types of amino acid residues, namely serine, threonine, and tyrosine, have been found to be susceptible to phosphorylation, which leads to the requirement of multi-label phosphorylated protein identification. Therefore, in this study, a novel computational tool termed iMulti-HumPhos has been developed to predict multi-label phosphorylated proteins by (1) extracting three different sets of features from protein sequences, (2) defining an individual kernel for each set of features and combining them into a single kernel using multiple kernel learning, and (3) constructing a multi-label predictor using a combination of support vector machines (SVMs) where each SVM has been trained with the combined kernel. In addition, we have balanced the effect of the skewed training dataset by the Different Error Costs method for the development of our system. The experimental results show that the iMulti-HumPhos predictor provides significantly better performance than the existing predictor Multi-iPPseEvo. A user-friendly web-server of iMulti-HumPhos is available at http://research.ru.ac.bd/iMulti-HumPhos/.

Abstract Image

multi- humphos:一个多标签分类器,用于使用基于多核学习的支持向量机识别人类磷酸化蛋白
蛋白质磷酸化在调节蛋白质构象和功能中起着潜在的作用。因此,鉴定一个未被鉴定的蛋白序列为磷酸化蛋白是一个非常有意义的问题,也是基础研究和药物开发的迫切问题。尽管已经开发了各种类型的计算方法来确定已识别的磷酸化蛋白的磷酸化位点,但很少有计算方法来确定未表征的蛋白是否可以磷酸化。因此,还存在一些进一步改进的空间来表征一个蛋白是否被磷酸化。在蛋白质分子的所有残基中,已发现丝氨酸、苏氨酸和酪氨酸三种氨基酸残基易被磷酸化,这就需要对磷酸化蛋白进行多标记鉴定。因此,在本研究中,我们开发了一种名为immulti - humphos的新型计算工具,通过(1)从蛋白质序列中提取三组不同的特征,(2)为每组特征定义一个单独的核,并使用多核学习将它们组合成一个核,来预测多标签磷酸化蛋白质。(3)使用支持向量机(SVM)的组合构建多标签预测器,其中每个支持向量机都经过组合核的训练。此外,我们还通过不同误差成本方法来平衡倾斜训练数据集的影响,以用于我们系统的开发。实验结果表明,immulti - humphos预测器的性能明显优于现有的Multi-iPPseEvo预测器。用户友好的multi - humphos网络服务器可在http://research.ru.ac.bd/iMulti-HumPhos/获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular BioSystems
Molecular BioSystems 生物-生化与分子生物学
CiteScore
2.94
自引率
0.00%
发文量
0
审稿时长
2.6 months
期刊介绍: Molecular Omics publishes molecular level experimental and bioinformatics research in the -omics sciences, including genomics, proteomics, transcriptomics and metabolomics. We will also welcome multidisciplinary papers presenting studies combining different types of omics, or the interface of omics and other fields such as systems biology or chemical biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信