Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods

IF 3.4 4区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

Proteomics Pub Date : 2024-05-27 DOI:10.1002/pmic.202400004

Dashleen Kaur, Akanksha Arora, Palani Vigneshwar, Gajendra P. S. Raghava

{"title":"Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods","authors":"Dashleen Kaur, Akanksha Arora, Palani Vigneshwar, Gajendra P. S. Raghava","doi":"10.1002/pmic.202400004","DOIUrl":null,"url":null,"abstract":"<p>Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":"24 20","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/pmic.202400004","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.

查看原文本刊更多论文

利用机器学习和基于相似性的方法组合预测肽类激素。

肽类激素是基因组编码的信号转导分子，在多细胞生物体中发挥着重要作用，它们的失调会导致各种健康问题。在本研究中，我们提出了一种高精度预测激素肽的方法。用于训练、测试和评估模型的数据集包括 1174 个激素肽序列和 1174 个非激素肽序列。最初，我们利用 BLAST 和 MERCI 软件开发了基于相似性的方法。虽然这些基于相似性的方法提供了较高的正确预测概率，但它们也有局限性，如没有命中或预测的序列有限。为了克服这些局限性，我们进一步开发了基于机器学习和深度学习的模型。在一个独立/验证数据集上，我们基于逻辑回归的模型达到了最大 AUROC 0.93，准确率为 86%。为了利用基于相似性和机器学习的模型的力量，我们开发了一种集合方法，该方法在验证集上的AUROC达到0.96，准确率为89.79%，马修斯相关系数（MCC）为0.8。为了方便研究人员预测和设计激素肽，我们开发了一个名为 HOPPred 的网络服务器。该服务器具有一个独特的功能，可以识别激素肽中的激素相关基团。访问该服务器的网址是：https://webs.iiitd.edu.in/raghava/hoppred/.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proteomics 生物-生化研究方法

CiteScore

6.30

自引率

5.90%

发文量

193

审稿时长

3 months

期刊介绍： PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.