StackDPPred: Multiclass prediction of defensin peptides using stacked ensemble learning with optimized features

IF 4.2 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Methods Pub Date : 2024-08-22 DOI:10.1016/j.ymeth.2024.08.001

Muhammad Arif , Saleh Musleh , Ali Ghulam , Huma Fida , Yasser Alqahtani , Tanvir Alam

{"title":"StackDPPred: Multiclass prediction of defensin peptides using stacked ensemble learning with optimized features","authors":"Muhammad Arif , Saleh Musleh , Ali Ghulam , Huma Fida , Yasser Alqahtani , Tanvir Alam","doi":"10.1016/j.ymeth.2024.08.001","DOIUrl":null,"url":null,"abstract":"<div><p>Host defense or antimicrobial peptides (AMPs) are promising candidates for protecting host against microbial pathogens for example bacteria, virus, fungi, yeast. Defensins are the type of AMPs that act as potential therapeutic drug agent and perform vital role in various biological process. Conventional Experiments to identify defensin peptides (DPs) are time consuming and expensive. Thus, the shortcomings of wet lab experiments are leveraged by computational methods to accurately predict the functional types of DPs. In this paper, we aim to propose a novel multi-class ensemble-based prediction model called StackDPPred for identifying the properties of DPs. The peptide sequences are encoded using split amino acid composition (SAAC), segmented position specific scoring matrix (SegPSSM), histogram of oriented gradients-based PSSM (HOGPSSM) and feature extraction based graphical and statistical (FEGS) descriptors. Next, principal component analysis (PCA) is used to select the best subset of attributes. After that, the optimized features are fed into single machine learning and stacking-based ensemble classifiers. Furthermore, the ablation study demonstrates the robustness and efficacy of the stacking approach using reduced features for predicting DPs and their families. The proposed StackDPPred method improves the overall accuracy by 13.41% and 7.62% compared to existing DPs predictors iDPF-PseRAAC and iDEF-PseRAAC, respectively on validation test. Additionally, we applied the local interpretable model-agnostic explanations (LIME) algorithm to understand the contribution of selected features to the overall prediction. We believe, StackDPPred could serve as a valuable tool accelerating the screening of large-scale DPs and peptide-based drug discovery process.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 129-139"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1046202324001828/pdfft?md5=315d0a8005d4827680fb3f30ae38db5c&pid=1-s2.0-S1046202324001828-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324001828","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Host defense or antimicrobial peptides (AMPs) are promising candidates for protecting host against microbial pathogens for example bacteria, virus, fungi, yeast. Defensins are the type of AMPs that act as potential therapeutic drug agent and perform vital role in various biological process. Conventional Experiments to identify defensin peptides (DPs) are time consuming and expensive. Thus, the shortcomings of wet lab experiments are leveraged by computational methods to accurately predict the functional types of DPs. In this paper, we aim to propose a novel multi-class ensemble-based prediction model called StackDPPred for identifying the properties of DPs. The peptide sequences are encoded using split amino acid composition (SAAC), segmented position specific scoring matrix (SegPSSM), histogram of oriented gradients-based PSSM (HOGPSSM) and feature extraction based graphical and statistical (FEGS) descriptors. Next, principal component analysis (PCA) is used to select the best subset of attributes. After that, the optimized features are fed into single machine learning and stacking-based ensemble classifiers. Furthermore, the ablation study demonstrates the robustness and efficacy of the stacking approach using reduced features for predicting DPs and their families. The proposed StackDPPred method improves the overall accuracy by 13.41% and 7.62% compared to existing DPs predictors iDPF-PseRAAC and iDEF-PseRAAC, respectively on validation test. Additionally, we applied the local interpretable model-agnostic explanations (LIME) algorithm to understand the contribution of selected features to the overall prediction. We believe, StackDPPred could serve as a valuable tool accelerating the screening of large-scale DPs and peptide-based drug discovery process.

查看原文本刊更多论文

StackDPPred：利用具有优化特征的堆叠集合学习对 Defensin 肽进行多类预测。

宿主防御或抗菌肽（AMPs）是保护宿主免受细菌、病毒、真菌和酵母等微生物病原体侵害的有效候选物质。防御素是一种潜在的治疗药物，在各种生物过程中发挥着重要作用。鉴定防御素肽（DPs）的传统实验既耗时又昂贵。因此，湿实验室实验的缺点被计算方法所利用，以准确预测 DPs 的功能类型。本文旨在提出一种新颖的基于多类集合的预测模型，称为 StackDPPred，用于识别 DPs 的特性。肽序列的编码使用了氨基酸组成拆分（SAAC）、分割位置特异性评分矩阵（SegPSSM）、基于定向梯度的直方图PSSM（HOGPSSM）以及基于图形和统计的特征提取（FEGS）描述符，然后使用主成分分析（PCA）来选择最佳属性子集。此外，消融研究证明了使用减少的特征预测 DPs 及其家族的堆叠方法的稳健性和有效性。在验证测试中，与其他现有的 DPs 预测器 iDPF-PseRAAC 和 iDEF-PseRAAC 相比，所提出的 StackDPPred 方法将所有五种家族类型的总体准确率分别提高了 13.41% 和 7.62%。我们相信，StackDPPred可以作为一种有价值的工具，加速大规模DPs筛选和基于肽的药物发现过程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Methods 生物-生化研究方法

CiteScore

9.80

自引率

2.10%

发文量

222

审稿时长

11.3 weeks

期刊介绍： Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.