An explainable stacking-based approach for accelerating the prediction of antidiabetic peptides

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
Farwa Arshad, Saeed Ahmed, Aqsa Amjad, Muhammad Kabir
{"title":"An explainable stacking-based approach for accelerating the prediction of antidiabetic peptides","authors":"Farwa Arshad,&nbsp;Saeed Ahmed,&nbsp;Aqsa Amjad,&nbsp;Muhammad Kabir","doi":"10.1016/j.ab.2024.115546","DOIUrl":null,"url":null,"abstract":"<div><p>Diabetes is a chronic disease that is characterized by high blood sugar levels and can have several harmful outcomes. Hyperglycemia, which is defined by persistently elevated blood sugar, is one of the primary concerns. People can improve their overall well-being and get optimal health outcomes by prioritizing diabetes control. Although the use of experimental approaches in diabetes treatment is cost-effective, it necessitates the development of many strategies for evaluating the efficacy of therapies. Researchers can quickly create new strategies for managing diabetes and get vital insights by enabling virtual screening with computational tools and procedures. In this study, we suggest a predictor named <strong><em>STADIP</em></strong> (<strong><em>ST</em></strong>acking-based predictor for <strong><em>A</em></strong>nti<strong><em>Di</em></strong>abetic <strong>P</strong>eptides), a new method to predict antidiabetic peptides (ADPs) utilizing a stacked-based ensemble approach. It uses 12 different feature encodings and seven machine-learning techniques to construct 84 baseline models. The impacts of various baseline models on ADP prediction were then thoroughly examined. A two-step feature selection method, eXtreme Gradient Boosting with Sequential Forward Selection (XGB-SFS), was employed to determine the optimal number, out of 84 PFs to enhance predictive performance. Subsequently, utilizing the meta-predictor approach, 45 selected PFs were integrated into an XGB classifier to formulate the final hybrid model. The proposed method demonstrated superior predictive capabilities compared to constituent baseline models, as evidenced by evaluations on both cross-validation and independent tests. During extensive independent testing, <em><strong>STADIP</strong></em> achieved promising performance with accuracy and mathew's correlation coefficient of 0.954 and 0.877, respectively. It is anticipated that it will be useful tool in helping the scientific community to identify new antidiabetic proteins.</p></div>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003269724000903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

Diabetes is a chronic disease that is characterized by high blood sugar levels and can have several harmful outcomes. Hyperglycemia, which is defined by persistently elevated blood sugar, is one of the primary concerns. People can improve their overall well-being and get optimal health outcomes by prioritizing diabetes control. Although the use of experimental approaches in diabetes treatment is cost-effective, it necessitates the development of many strategies for evaluating the efficacy of therapies. Researchers can quickly create new strategies for managing diabetes and get vital insights by enabling virtual screening with computational tools and procedures. In this study, we suggest a predictor named STADIP (STacking-based predictor for AntiDiabetic Peptides), a new method to predict antidiabetic peptides (ADPs) utilizing a stacked-based ensemble approach. It uses 12 different feature encodings and seven machine-learning techniques to construct 84 baseline models. The impacts of various baseline models on ADP prediction were then thoroughly examined. A two-step feature selection method, eXtreme Gradient Boosting with Sequential Forward Selection (XGB-SFS), was employed to determine the optimal number, out of 84 PFs to enhance predictive performance. Subsequently, utilizing the meta-predictor approach, 45 selected PFs were integrated into an XGB classifier to formulate the final hybrid model. The proposed method demonstrated superior predictive capabilities compared to constituent baseline models, as evidenced by evaluations on both cross-validation and independent tests. During extensive independent testing, STADIP achieved promising performance with accuracy and mathew's correlation coefficient of 0.954 and 0.877, respectively. It is anticipated that it will be useful tool in helping the scientific community to identify new antidiabetic proteins.

Abstract Image

基于堆叠的抗糖尿病肽精确预测方法
糖尿病是一种以高血糖为特征的慢性疾病,可导致多种有害后果。高血糖是指血糖持续升高,是糖尿病的主要危害之一。人们可以通过优先控制糖尿病来提高整体健康水平,获得最佳的健康结果。虽然在糖尿病治疗中使用实验方法具有成本效益,但这需要开发许多评估疗效的策略。通过利用计算工具和程序进行虚拟筛选,研究人员可以快速创建新的糖尿病管理策略,并获得重要的见解。在这项研究中,我们提出了一种名为 STADIP(基于堆叠的抗糖尿病肽预测器)的预测器,这是一种利用基于堆叠的集合方法预测抗糖尿病肽(ADPs)的新方法。它使用 12 种不同的特征编码和 7 种机器学习技术构建了 84 个基线模型。然后深入研究了各种基线模型对 ADP 预测的影响。在 84 个 PF 中,采用了两步特征选择方法--梯度提升与序列前向选择(XGB-SFS),以确定最佳数量,从而提高预测性能。随后,利用元预测器方法,将 45 个选定的 PFs 集成到 XGB 分类器中,形成最终的混合模型。交叉验证和独立测试的评估结果表明,与各组成基线模型相比,所提出的方法具有更出色的预测能力。在广泛的独立测试中,STADIP 的准确率和马修相关系数分别为 0.954 和 0.877,表现令人满意。预计它将成为帮助科学界鉴定新的抗糖尿病蛋白质的有用工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信