An explainable stacking-based approach for accelerating the prediction of antidiabetic peptides

IF 2.6 4区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

Analytical biochemistry Pub Date : 2024-04-25 DOI:10.1016/j.ab.2024.115546

Farwa Arshad, Saeed Ahmed, Aqsa Amjad, Muhammad Kabir

{"title":"An explainable stacking-based approach for accelerating the prediction of antidiabetic peptides","authors":"Farwa Arshad, Saeed Ahmed, Aqsa Amjad, Muhammad Kabir","doi":"10.1016/j.ab.2024.115546","DOIUrl":null,"url":null,"abstract":"<div>Diabetes is a chronic disease that is characterized by high blood sugar levels and can have several harmful outcomes. Hyperglycemia, which is defined by persistently elevated blood sugar, is one of the primary concerns. People can improve their overall well-being and get optimal health outcomes by prioritizing diabetes control. Although the use of experimental approaches in diabetes treatment is cost-effective, it necessitates the development of many strategies for evaluating the efficacy of therapies. Researchers can quickly create new strategies for managing diabetes and get vital insights by enabling virtual screening with computational tools and procedures. In this study, we suggest a predictor named STADIP (STacking-based predictor for AntiDiabetic Peptides), a new method to predict antidiabetic peptides (ADPs) utilizing a stacked-based ensemble approach. It uses 12 different feature encodings and seven machine-learning techniques to construct 84 baseline models. The impacts of various baseline models on ADP prediction were then thoroughly examined. A two-step feature selection method, eXtreme Gradient Boosting with Sequential Forward Selection (XGB-SFS), was employed to determine the optimal number, out of 84 PFs to enhance predictive performance. Subsequently, utilizing the meta-predictor approach, 45 selected PFs were integrated into an XGB classifier to formulate the final hybrid model. The proposed method demonstrated superior predictive capabilities compared to constituent baseline models, as evidenced by evaluations on both cross-validation and independent tests. During extensive independent testing, STADIP achieved promising performance with accuracy and mathew's correlation coefficient of 0.954 and 0.877, respectively. It is anticipated that it will be useful tool in helping the scientific community to identify new antidiabetic proteins.</div>","PeriodicalId":7830,"journal":{"name":"Analytical biochemistry","volume":"691 ","pages":"Article 115546"},"PeriodicalIF":2.6000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003269724000903","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Diabetes is a chronic disease that is characterized by high blood sugar levels and can have several harmful outcomes. Hyperglycemia, which is defined by persistently elevated blood sugar, is one of the primary concerns. People can improve their overall well-being and get optimal health outcomes by prioritizing diabetes control. Although the use of experimental approaches in diabetes treatment is cost-effective, it necessitates the development of many strategies for evaluating the efficacy of therapies. Researchers can quickly create new strategies for managing diabetes and get vital insights by enabling virtual screening with computational tools and procedures. In this study, we suggest a predictor named STADIP (STacking-based predictor for AntiDiabetic Peptides), a new method to predict antidiabetic peptides (ADPs) utilizing a stacked-based ensemble approach. It uses 12 different feature encodings and seven machine-learning techniques to construct 84 baseline models. The impacts of various baseline models on ADP prediction were then thoroughly examined. A two-step feature selection method, eXtreme Gradient Boosting with Sequential Forward Selection (XGB-SFS), was employed to determine the optimal number, out of 84 PFs to enhance predictive performance. Subsequently, utilizing the meta-predictor approach, 45 selected PFs were integrated into an XGB classifier to formulate the final hybrid model. The proposed method demonstrated superior predictive capabilities compared to constituent baseline models, as evidenced by evaluations on both cross-validation and independent tests. During extensive independent testing, STADIP achieved promising performance with accuracy and mathew's correlation coefficient of 0.954 and 0.877, respectively. It is anticipated that it will be useful tool in helping the scientific community to identify new antidiabetic proteins.

Abstract Image

查看原文本刊更多论文

基于堆叠的抗糖尿病肽精确预测方法

糖尿病是一种以高血糖为特征的慢性疾病，可导致多种有害后果。高血糖是指血糖持续升高，是糖尿病的主要危害之一。人们可以通过优先控制糖尿病来提高整体健康水平，获得最佳的健康结果。虽然在糖尿病治疗中使用实验方法具有成本效益，但这需要开发许多评估疗效的策略。通过利用计算工具和程序进行虚拟筛选，研究人员可以快速创建新的糖尿病管理策略，并获得重要的见解。在这项研究中，我们提出了一种名为 STADIP（基于堆叠的抗糖尿病肽预测器）的预测器，这是一种利用基于堆叠的集合方法预测抗糖尿病肽（ADPs）的新方法。它使用 12 种不同的特征编码和 7 种机器学习技术构建了 84 个基线模型。然后深入研究了各种基线模型对 ADP 预测的影响。在 84 个 PF 中，采用了两步特征选择方法--梯度提升与序列前向选择（XGB-SFS），以确定最佳数量，从而提高预测性能。随后，利用元预测器方法，将 45 个选定的 PFs 集成到 XGB 分类器中，形成最终的混合模型。交叉验证和独立测试的评估结果表明，与各组成基线模型相比，所提出的方法具有更出色的预测能力。在广泛的独立测试中，STADIP 的准确率和马修相关系数分别为 0.954 和 0.877，表现令人满意。预计它将成为帮助科学界鉴定新的抗糖尿病蛋白质的有用工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Analytical biochemistry 生物-分析化学

CiteScore

5.70

自引率

0.00%

发文量

283

审稿时长

44 days

期刊介绍： The journal''s title Analytical Biochemistry: Methods in the Biological Sciences declares its broad scope: methods for the basic biological sciences that include biochemistry, molecular genetics, cell biology, proteomics, immunology, bioinformatics and wherever the frontiers of research take the field. The emphasis is on methods from the strictly analytical to the more preparative that would include novel approaches to protein purification as well as improvements in cell and organ culture. The actual techniques are equally inclusive ranging from aptamers to zymology. The journal has been particularly active in: -Analytical techniques for biological molecules- Aptamer selection and utilization- Biosensors- Chromatography- Cloning, sequencing and mutagenesis- Electrochemical methods- Electrophoresis- Enzyme characterization methods- Immunological approaches- Mass spectrometry of proteins and nucleic acids- Metabolomics- Nano level techniques- Optical spectroscopy in all its forms. The journal is reluctant to include most drug and strictly clinical studies as there are more suitable publication platforms for these types of papers.