A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction.

Genomics, proteomics & bioinformatics Pub Date : 2023-12-01 Epub Date: 2023-10-19 DOI:10.1016/j.gpb.2023.03.007
Farzaneh Esmaili, Mahdi Pourmirzaei, Shahin Ramazi, Seyedehsamaneh Shojaeilangari, Elham Yavari
{"title":"A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Site Prediction.","authors":"Farzaneh Esmaili, Mahdi Pourmirzaei, Shahin Ramazi, Seyedehsamaneh Shojaeilangari, Elham Yavari","doi":"10.1016/j.gpb.2023.03.007","DOIUrl":null,"url":null,"abstract":"<p><p>Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and, as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases, including neurological disorders and cancers. The purpose of this review is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively review all related databases and introduce all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigate p-site prediction methods, which are divided into two computational groups: algorithmic and machine learning (ML). Additionally, it is shown that there are basically two main approaches for p-site prediction by ML: conventional and end-to-end deep learning methods, both of which are given an overview. Moreover, this review introduces the most important feature extraction techniques, which have mostly been used in p-site prediction. Finally, we create three test sets from new proteins related to the released version of the database of protein post-translational modifications (dbPTM) in 2022 based on general and human species. Evaluating online p-site prediction tools on newly added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, reveals their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11082408/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, proteomics & bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.gpb.2023.03.007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and, as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases, including neurological disorders and cancers. The purpose of this review is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively review all related databases and introduce all steps regarding dataset creation, data preprocessing, and method evaluation in p-site prediction. Next, we investigate p-site prediction methods, which are divided into two computational groups: algorithmic and machine learning (ML). Additionally, it is shown that there are basically two main approaches for p-site prediction by ML: conventional and end-to-end deep learning methods, both of which are given an overview. Moreover, this review introduces the most important feature extraction techniques, which have mostly been used in p-site prediction. Finally, we create three test sets from new proteins related to the released version of the database of protein post-translational modifications (dbPTM) in 2022 based on general and human species. Evaluating online p-site prediction tools on newly added proteins introduced in the dbPTM 2022 release, distinct from those in the dbPTM 2019 release, reveals their limitations. In other words, the actual performance of these online p-site prediction tools on unseen proteins is notably lower than the results reported in their respective research papers.

蛋白质磷酸化位点预测的机器学习和算法方法综述。
翻译后修饰(PTMs)在扩展蛋白质的功能多样性方面发挥着关键作用,从而调节原核生物和真核生物的不同细胞过程。磷酸化修饰是一种重要的PTM,发生在大多数蛋白质中,在许多生物过程中发挥着重要作用。磷酸化过程中的障碍会导致多种疾病,包括神经系统疾病和癌症。这篇综述论文的目的是组织与磷酸化位点(p-位点)预测相关的知识体系,以促进该领域的未来研究。首先,我们全面回顾了所有相关数据库,并介绍了p位点预测中数据集创建、数据预处理和方法评估的所有步骤。接下来,我们研究了p位点预测方法,这些方法分为两组:算法和机器学习(ML)。此外,研究表明,ML预测p位点基本上有两种主要方法:传统的和端到端的深度学习方法,并对这两种方法进行了概述。此外,本研究还介绍了最重要的特征提取技术,这些技术主要用于p位点预测。最后,我们根据普通物种和人类物种,从与2022年发布的dbPTM数据库版本相关的新蛋白质中创建了三个测试集。评估dbPTM 2022版本中引入的新添加蛋白质的在线p位点预测工具,与dbPTM 2019版本中的工具不同,揭示了它们的局限性。换句话说,这些在线p位点预测工具对看不见的蛋白质的实际性能明显低于其各自研究论文中报告的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信