StackDPPred: Multiclass prediction of defensin peptides using stacked ensemble learning with optimized features

IF 4.2 3区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Muhammad Arif , Saleh Musleh , Ali Ghulam , Huma Fida , Yasser Alqahtani , Tanvir Alam
{"title":"StackDPPred: Multiclass prediction of defensin peptides using stacked ensemble learning with optimized features","authors":"Muhammad Arif ,&nbsp;Saleh Musleh ,&nbsp;Ali Ghulam ,&nbsp;Huma Fida ,&nbsp;Yasser Alqahtani ,&nbsp;Tanvir Alam","doi":"10.1016/j.ymeth.2024.08.001","DOIUrl":null,"url":null,"abstract":"<div><p>Host defense or antimicrobial peptides (AMPs) are promising candidates for protecting host against microbial pathogens for example bacteria, virus, fungi, yeast. Defensins are the type of AMPs that act as potential therapeutic drug agent and perform vital role in various biological process. Conventional Experiments to identify defensin peptides (DPs) are time consuming and expensive. Thus, the shortcomings of wet lab experiments are leveraged by computational methods to accurately predict the functional types of DPs. In this paper, we aim to propose a novel multi-class ensemble-based prediction model called StackDPPred for identifying the properties of DPs. The peptide sequences are encoded using split amino acid composition (SAAC), segmented position specific scoring matrix (SegPSSM), histogram of oriented gradients-based PSSM (HOGPSSM) and feature extraction based graphical and statistical (FEGS) descriptors. Next, principal component analysis (PCA) is used to select the best subset of attributes. After that, the optimized features are fed into single machine learning and stacking-based ensemble classifiers. Furthermore, the ablation study demonstrates the robustness and efficacy of the stacking approach using reduced features for predicting DPs and their families. The proposed StackDPPred method improves the overall accuracy by 13.41% and 7.62% compared to existing DPs predictors iDPF-PseRAAC and iDEF-PseRAAC, respectively on validation test. Additionally, we applied the local interpretable model-agnostic explanations (LIME) algorithm to understand the contribution of selected features to the overall prediction. We believe, StackDPPred could serve as a valuable tool accelerating the screening of large-scale DPs and peptide-based drug discovery process.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 129-139"},"PeriodicalIF":4.2000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1046202324001828/pdfft?md5=315d0a8005d4827680fb3f30ae38db5c&pid=1-s2.0-S1046202324001828-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1046202324001828","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Host defense or antimicrobial peptides (AMPs) are promising candidates for protecting host against microbial pathogens for example bacteria, virus, fungi, yeast. Defensins are the type of AMPs that act as potential therapeutic drug agent and perform vital role in various biological process. Conventional Experiments to identify defensin peptides (DPs) are time consuming and expensive. Thus, the shortcomings of wet lab experiments are leveraged by computational methods to accurately predict the functional types of DPs. In this paper, we aim to propose a novel multi-class ensemble-based prediction model called StackDPPred for identifying the properties of DPs. The peptide sequences are encoded using split amino acid composition (SAAC), segmented position specific scoring matrix (SegPSSM), histogram of oriented gradients-based PSSM (HOGPSSM) and feature extraction based graphical and statistical (FEGS) descriptors. Next, principal component analysis (PCA) is used to select the best subset of attributes. After that, the optimized features are fed into single machine learning and stacking-based ensemble classifiers. Furthermore, the ablation study demonstrates the robustness and efficacy of the stacking approach using reduced features for predicting DPs and their families. The proposed StackDPPred method improves the overall accuracy by 13.41% and 7.62% compared to existing DPs predictors iDPF-PseRAAC and iDEF-PseRAAC, respectively on validation test. Additionally, we applied the local interpretable model-agnostic explanations (LIME) algorithm to understand the contribution of selected features to the overall prediction. We believe, StackDPPred could serve as a valuable tool accelerating the screening of large-scale DPs and peptide-based drug discovery process.

StackDPPred:利用具有优化特征的堆叠集合学习对 Defensin 肽进行多类预测。
宿主防御或抗菌肽(AMPs)是保护宿主免受细菌、病毒、真菌和酵母等微生物病原体侵害的有效候选物质。防御素是一种潜在的治疗药物,在各种生物过程中发挥着重要作用。鉴定防御素肽(DPs)的传统实验既耗时又昂贵。因此,湿实验室实验的缺点被计算方法所利用,以准确预测 DPs 的功能类型。本文旨在提出一种新颖的基于多类集合的预测模型,称为 StackDPPred,用于识别 DPs 的特性。肽序列的编码使用了氨基酸组成拆分(SAAC)、分割位置特异性评分矩阵(SegPSSM)、基于定向梯度的直方图PSSM(HOGPSSM)以及基于图形和统计的特征提取(FEGS)描述符,然后使用主成分分析(PCA)来选择最佳属性子集。此外,消融研究证明了使用减少的特征预测 DPs 及其家族的堆叠方法的稳健性和有效性。在验证测试中,与其他现有的 DPs 预测器 iDPF-PseRAAC 和 iDEF-PseRAAC 相比,所提出的 StackDPPred 方法将所有五种家族类型的总体准确率分别提高了 13.41% 和 7.62%。我们相信,StackDPPred可以作为一种有价值的工具,加速大规模DPs筛选和基于肽的药物发现过程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Methods
Methods 生物-生化研究方法
CiteScore
9.80
自引率
2.10%
发文量
222
审稿时长
11.3 weeks
期刊介绍: Methods focuses on rapidly developing techniques in the experimental biological and medical sciences. Each topical issue, organized by a guest editor who is an expert in the area covered, consists solely of invited quality articles by specialist authors, many of them reviews. Issues are devoted to specific technical approaches with emphasis on clear detailed descriptions of protocols that allow them to be reproduced easily. The background information provided enables researchers to understand the principles underlying the methods; other helpful sections include comparisons of alternative methods giving the advantages and disadvantages of particular methods, guidance on avoiding potential pitfalls, and suggestions for troubleshooting.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信