Farwa Arshad, Saeed Ahmed, Aqsa Amjad, Muhammad Kabir
{"title":"An explainable stacking-based approach for accelerating the prediction of antidiabetic peptides","authors":"Farwa Arshad, Saeed Ahmed, Aqsa Amjad, Muhammad Kabir","doi":"10.1016/j.ab.2024.115546","DOIUrl":null,"url":null,"abstract":"<div><p>Diabetes is a chronic disease that is characterized by high blood sugar levels and can have several harmful outcomes. Hyperglycemia, which is defined by persistently elevated blood sugar, is one of the primary concerns. People can improve their overall well-being and get optimal health outcomes by prioritizing diabetes control. Although the use of experimental approaches in diabetes treatment is cost-effective, it necessitates the development of many strategies for evaluating the efficacy of therapies. Researchers can quickly create new strategies for managing diabetes and get vital insights by enabling virtual screening with computational tools and procedures. In this study, we suggest a predictor named <strong><em>STADIP</em></strong> (<strong><em>ST</em></strong>acking-based predictor for <strong><em>A</em></strong>nti<strong><em>Di</em></strong>abetic <strong>P</strong>eptides), a new method to predict antidiabetic peptides (ADPs) utilizing a stacked-based ensemble approach. It uses 12 different feature encodings and seven machine-learning techniques to construct 84 baseline models. The impacts of various baseline models on ADP prediction were then thoroughly examined. A two-step feature selection method, eXtreme Gradient Boosting with Sequential Forward Selection (XGB-SFS), was employed to determine the optimal number, out of 84 PFs to enhance predictive performance. Subsequently, utilizing the meta-predictor approach, 45 selected PFs were integrated into an XGB classifier to formulate the final hybrid model. The proposed method demonstrated superior predictive capabilities compared to constituent baseline models, as evidenced by evaluations on both cross-validation and independent tests. During extensive independent testing, <em><strong>STADIP</strong></em> achieved promising performance with accuracy and mathew's correlation coefficient of 0.954 and 0.877, respectively. It is anticipated that it will be useful tool in helping the scientific community to identify new antidiabetic proteins.</p></div>","PeriodicalId":7830,"journal":{"name":"Analytical biochemistry","volume":"691 ","pages":"Article 115546"},"PeriodicalIF":2.6000,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003269724000903","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Diabetes is a chronic disease that is characterized by high blood sugar levels and can have several harmful outcomes. Hyperglycemia, which is defined by persistently elevated blood sugar, is one of the primary concerns. People can improve their overall well-being and get optimal health outcomes by prioritizing diabetes control. Although the use of experimental approaches in diabetes treatment is cost-effective, it necessitates the development of many strategies for evaluating the efficacy of therapies. Researchers can quickly create new strategies for managing diabetes and get vital insights by enabling virtual screening with computational tools and procedures. In this study, we suggest a predictor named STADIP (STacking-based predictor for AntiDiabetic Peptides), a new method to predict antidiabetic peptides (ADPs) utilizing a stacked-based ensemble approach. It uses 12 different feature encodings and seven machine-learning techniques to construct 84 baseline models. The impacts of various baseline models on ADP prediction were then thoroughly examined. A two-step feature selection method, eXtreme Gradient Boosting with Sequential Forward Selection (XGB-SFS), was employed to determine the optimal number, out of 84 PFs to enhance predictive performance. Subsequently, utilizing the meta-predictor approach, 45 selected PFs were integrated into an XGB classifier to formulate the final hybrid model. The proposed method demonstrated superior predictive capabilities compared to constituent baseline models, as evidenced by evaluations on both cross-validation and independent tests. During extensive independent testing, STADIP achieved promising performance with accuracy and mathew's correlation coefficient of 0.954 and 0.877, respectively. It is anticipated that it will be useful tool in helping the scientific community to identify new antidiabetic proteins.
期刊介绍:
The journal''s title Analytical Biochemistry: Methods in the Biological Sciences declares its broad scope: methods for the basic biological sciences that include biochemistry, molecular genetics, cell biology, proteomics, immunology, bioinformatics and wherever the frontiers of research take the field.
The emphasis is on methods from the strictly analytical to the more preparative that would include novel approaches to protein purification as well as improvements in cell and organ culture. The actual techniques are equally inclusive ranging from aptamers to zymology.
The journal has been particularly active in:
-Analytical techniques for biological molecules-
Aptamer selection and utilization-
Biosensors-
Chromatography-
Cloning, sequencing and mutagenesis-
Electrochemical methods-
Electrophoresis-
Enzyme characterization methods-
Immunological approaches-
Mass spectrometry of proteins and nucleic acids-
Metabolomics-
Nano level techniques-
Optical spectroscopy in all its forms.
The journal is reluctant to include most drug and strictly clinical studies as there are more suitable publication platforms for these types of papers.