PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs

IF 3.9 2区化学 Q2 CHEMISTRY, APPLIED

Molecular Diversity Pub Date : 2024-07-21 DOI:10.1007/s11030-024-10937-2

Arvind Kumar Yadav, Pradeep Kumar Gupta, Tiratha Raj Singh

{"title":"PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs","authors":"Arvind Kumar Yadav, Pradeep Kumar Gupta, Tiratha Raj Singh","doi":"10.1007/s11030-024-10937-2","DOIUrl":null,"url":null,"abstract":"<div><p>Protein methyltransferases (PMTs) are a group of enzymes that help catalyze the transfer of a methyl group to its substrates. These enzymes play an important role in epigenetic regulation and can methylate various substrates with DNA, RNA, protein, and small-molecule secondary metabolites. Dysregulation of methyltransferases is implicated in various human cancers. However, in light of the well-recognized significance of PMTs, reliable and efficient identification methods are essential. In the present work, we propose a machine-learning-based method for the identification of PMTs. Various sequence-based features were calculated, and prediction models were trained using various machine-learning algorithms using a tenfold cross-validation technique. After evaluating each model on the dataset, the SVM-based CKSAAP model achieved the highest prediction accuracy with balanced sensitivity and specificity. Also, this SVM model outperformed deep-learning algorithms for the prediction of PMTs. In addition, cross-database validation was performed to ensure the robustness of the model. Feature importance was assessed using shapley additive explanations (SHAP) values, providing insights into the contributions of different features to the model’s predictions. Finally, the SVM-based CKSAAP model was implemented in a standalone tool, PMTPred, due to its consistent performance during independent testing and cross-database evaluation. We believe that PMTPred will be a useful and efficient tool for the identification of PMTs. The PMTPred is freely available for download at https://github.com/ArvindYadav7/PMTPred and http://www.bioinfoindia.org/PMTPred/home.html for research and academic use.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":708,"journal":{"name":"Molecular Diversity","volume":"28 4","pages":"2301 - 2315"},"PeriodicalIF":3.9000,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Diversity","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1007/s11030-024-10937-2","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

Protein methyltransferases (PMTs) are a group of enzymes that help catalyze the transfer of a methyl group to its substrates. These enzymes play an important role in epigenetic regulation and can methylate various substrates with DNA, RNA, protein, and small-molecule secondary metabolites. Dysregulation of methyltransferases is implicated in various human cancers. However, in light of the well-recognized significance of PMTs, reliable and efficient identification methods are essential. In the present work, we propose a machine-learning-based method for the identification of PMTs. Various sequence-based features were calculated, and prediction models were trained using various machine-learning algorithms using a tenfold cross-validation technique. After evaluating each model on the dataset, the SVM-based CKSAAP model achieved the highest prediction accuracy with balanced sensitivity and specificity. Also, this SVM model outperformed deep-learning algorithms for the prediction of PMTs. In addition, cross-database validation was performed to ensure the robustness of the model. Feature importance was assessed using shapley additive explanations (SHAP) values, providing insights into the contributions of different features to the model’s predictions. Finally, the SVM-based CKSAAP model was implemented in a standalone tool, PMTPred, due to its consistent performance during independent testing and cross-database evaluation. We believe that PMTPred will be a useful and efficient tool for the identification of PMTs. The PMTPred is freely available for download at https://github.com/ArvindYadav7/PMTPred and http://www.bioinfoindia.org/PMTPred/home.html for research and academic use.

Graphical abstract

Abstract Image

查看原文本刊更多论文

PMTPred：基于机器学习的蛋白质甲基转移酶预测，使用 k 距氨基酸对的组成。

蛋白甲基转移酶（PMTs）是一类有助于催化甲基基团向底物转移的酶。这些酶在表观遗传调控中发挥着重要作用，可将 DNA、RNA、蛋白质和小分子次生代谢物等各种底物甲基化。甲基转移酶的失调与多种人类癌症有关。然而，鉴于 PMTs 的重要性已得到公认，可靠而高效的鉴定方法至关重要。在本研究中，我们提出了一种基于机器学习的 PMTs 识别方法。我们计算了各种基于序列的特征，并使用各种机器学习算法和十倍交叉验证技术训练了预测模型。在对数据集上的每个模型进行评估后，基于 SVM 的 CKSAAP 模型的预测准确率最高，灵敏度和特异性达到了平衡。在预测 PMT 方面，该 SVM 模型的表现也优于深度学习算法。此外，还进行了跨数据库验证，以确保模型的稳健性。使用夏普利加法解释（SHAP）值评估了特征的重要性，从而深入了解了不同特征对模型预测的贡献。最后，由于基于 SVM 的 CKSAAP 模型在独立测试和跨数据库评估过程中表现稳定，我们将其应用于独立工具 PMTPred 中。我们相信，PMTPred 将成为识别 PMT 的有用而高效的工具。PMTPred 可在 https://github.com/ArvindYadav7/PMTPred 和 http://www.bioinfoindia.org/PMTPred/home.html 免费下载，供研究和学术使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular Diversity 化学-化学综合

CiteScore

7.30

自引率

7.90%

发文量

219

审稿时长

2.7 months

期刊介绍： Molecular Diversity is a new publication forum for the rapid publication of refereed papers dedicated to describing the development, application and theory of molecular diversity and combinatorial chemistry in basic and applied research and drug discovery. The journal publishes both short and full papers, perspectives, news and reviews dealing with all aspects of the generation of molecular diversity, application of diversity for screening against alternative targets of all types (biological, biophysical, technological), analysis of results obtained and their application in various scientific disciplines/approaches including: combinatorial chemistry and parallel synthesis; small molecule libraries; microwave synthesis; flow synthesis; fluorous synthesis; diversity oriented synthesis (DOS); nanoreactors; click chemistry; multiplex technologies; fragment- and ligand-based design; structure/function/SAR; computational chemistry and molecular design; chemoinformatics; screening techniques and screening interfaces; analytical and purification methods; robotics, automation and miniaturization; targeted libraries; display libraries; peptides and peptoids; proteins; oligonucleotides; carbohydrates; natural diversity; new methods of library formulation and deconvolution; directed evolution, origin of life and recombination; search techniques, landscapes, random chemistry and more;