GPSD: a hybrid learning framework for the prediction of phosphatase-specific dephosphorylation sites.

IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Cheng Han, Shanshan Fu, Miaomiao Chen, Yujie Gou, Dan Liu, Chi Zhang, Xinhe Huang, Leming Xiao, Miaoying Zhao, Jiayi Zhang, Qiang Xiao, Di Peng, Yu Xue
{"title":"GPSD: a hybrid learning framework for the prediction of phosphatase-specific dephosphorylation sites.","authors":"Cheng Han, Shanshan Fu, Miaomiao Chen, Yujie Gou, Dan Liu, Chi Zhang, Xinhe Huang, Leming Xiao, Miaoying Zhao, Jiayi Zhang, Qiang Xiao, Di Peng, Yu Xue","doi":"10.1093/bib/bbae694","DOIUrl":null,"url":null,"abstract":"<p><p>Protein phosphorylation is dynamically and reversibly regulated by protein kinases and protein phosphatases, and plays an essential role in orchestrating a wide range of biological processes. Although a number of tools have been developed for predicting kinase-specific phosphorylation sites (p-sites), computational prediction of phosphatase-specific dephosphorylation sites remains to be a great challenge. In this study, we manually curated 4393 experimentally identified site-specific phosphatase-substrate relationships for 3463 dephosphorylation sites occurring on phosphoserine, phosphothreonine, and/or phosphotyrosine residues, from the literature and public databases. Then, we developed a hybrid learning framework, the group-based prediction system for the prediction of phosphatase-specific dephosphorylation sites (GPSD). For model training, we integrated 10 types of sequence features and utilized three types of machine learning methods, including penalized logistic regression, deep neural networks, and transformer neural networks. First, a pretrained model was constructed using 561 416 nonredundant p-sites and then fine-tuned to generate computational models for predicting general dephosphorylation sites. In addition, 103 individual phosphatase-specific predictors were constructed via transfer learning and meta-learning. For site prediction, one or multiple protein sequences in FASTA format could be inputted, and the prediction results will be shown together with additional annotations, such as protein-protein interactions, structural information, and disorder propensity. The online service of GPSD is freely available at https://gpsd.biocuckoo.cn/. We believe that GPSD can serve as a valuable tool for further analysis of dephosphorylation.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695897/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae694","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Protein phosphorylation is dynamically and reversibly regulated by protein kinases and protein phosphatases, and plays an essential role in orchestrating a wide range of biological processes. Although a number of tools have been developed for predicting kinase-specific phosphorylation sites (p-sites), computational prediction of phosphatase-specific dephosphorylation sites remains to be a great challenge. In this study, we manually curated 4393 experimentally identified site-specific phosphatase-substrate relationships for 3463 dephosphorylation sites occurring on phosphoserine, phosphothreonine, and/or phosphotyrosine residues, from the literature and public databases. Then, we developed a hybrid learning framework, the group-based prediction system for the prediction of phosphatase-specific dephosphorylation sites (GPSD). For model training, we integrated 10 types of sequence features and utilized three types of machine learning methods, including penalized logistic regression, deep neural networks, and transformer neural networks. First, a pretrained model was constructed using 561 416 nonredundant p-sites and then fine-tuned to generate computational models for predicting general dephosphorylation sites. In addition, 103 individual phosphatase-specific predictors were constructed via transfer learning and meta-learning. For site prediction, one or multiple protein sequences in FASTA format could be inputted, and the prediction results will be shown together with additional annotations, such as protein-protein interactions, structural information, and disorder propensity. The online service of GPSD is freely available at https://gpsd.biocuckoo.cn/. We believe that GPSD can serve as a valuable tool for further analysis of dephosphorylation.

GPSD:预测磷酸酶特异性去磷酸化位点的混合学习框架。
蛋白磷酸化受蛋白激酶和蛋白磷酸酶的动态可逆调控,在多种生物过程中起着重要作用。尽管已经开发了许多工具来预测激酶特异性磷酸化位点(p位点),但计算预测磷酸酶特异性去磷酸化位点仍然是一个巨大的挑战。在这项研究中,我们从文献和公共数据库中手动筛选了4393个位点特异性磷酸酶-底物关系,实验鉴定了3463个发生在磷丝氨酸、磷苏氨酸和/或磷酪氨酸残基上的去磷酸化位点。然后,我们开发了一个混合学习框架,即基于组的预测系统,用于预测磷酸酶特异性去磷酸化位点(GPSD)。对于模型训练,我们整合了10种类型的序列特征,并使用了三种类型的机器学习方法,包括惩罚逻辑回归,深度神经网络和变压器神经网络。首先,使用561 416个非冗余p位点构建预训练模型,然后进行微调以生成预测一般去磷酸化位点的计算模型。此外,通过迁移学习和元学习构建了103个个体磷酸酶特异性预测因子。对于位点预测,可以输入FASTA格式的一个或多个蛋白质序列,并将预测结果与蛋白质-蛋白质相互作用、结构信息、无序倾向等附加注释一起显示。政府服务署的网上服务可于https://gpsd.biocuckoo.cn/免费提供。我们相信GPSD可以作为进一步分析去磷酸化的有价值的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信