利用机器学习和基于相似性的方法组合预测肽类激素。

IF 3.4 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Proteomics Pub Date : 2024-05-27 DOI:10.1002/pmic.202400004
Dashleen Kaur, Akanksha Arora, Palani Vigneshwar, Gajendra P. S. Raghava
{"title":"利用机器学习和基于相似性的方法组合预测肽类激素。","authors":"Dashleen Kaur,&nbsp;Akanksha Arora,&nbsp;Palani Vigneshwar,&nbsp;Gajendra P. S. Raghava","doi":"10.1002/pmic.202400004","DOIUrl":null,"url":null,"abstract":"<p>Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":"24 20","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods\",\"authors\":\"Dashleen Kaur,&nbsp;Akanksha Arora,&nbsp;Palani Vigneshwar,&nbsp;Gajendra P. S. Raghava\",\"doi\":\"10.1002/pmic.202400004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.</p>\",\"PeriodicalId\":224,\"journal\":{\"name\":\"Proteomics\",\"volume\":\"24 20\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/pmic.202400004\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/pmic.202400004","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

肽类激素是基因组编码的信号转导分子,在多细胞生物体中发挥着重要作用,它们的失调会导致各种健康问题。在本研究中,我们提出了一种高精度预测激素肽的方法。用于训练、测试和评估模型的数据集包括 1174 个激素肽序列和 1174 个非激素肽序列。最初,我们利用 BLAST 和 MERCI 软件开发了基于相似性的方法。虽然这些基于相似性的方法提供了较高的正确预测概率,但它们也有局限性,如没有命中或预测的序列有限。为了克服这些局限性,我们进一步开发了基于机器学习和深度学习的模型。在一个独立/验证数据集上,我们基于逻辑回归的模型达到了最大 AUROC 0.93,准确率为 86%。为了利用基于相似性和机器学习的模型的力量,我们开发了一种集合方法,该方法在验证集上的AUROC达到0.96,准确率为89.79%,马修斯相关系数(MCC)为0.8。为了方便研究人员预测和设计激素肽,我们开发了一个名为 HOPPred 的网络服务器。该服务器具有一个独特的功能,可以识别激素肽中的激素相关基团。访问该服务器的网址是:https://webs.iiitd.edu.in/raghava/hoppred/.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods

Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Proteomics
Proteomics 生物-生化研究方法
CiteScore
6.30
自引率
5.90%
发文量
193
审稿时长
3 months
期刊介绍: PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信