StackAHTPs: An explainable antihypertensive peptides identifier based on heterogeneous features and stacked learning approach

IF 1.9 4区 生物学 Q4 CELL BIOLOGY
Ali Ghulam, Muhammad Arif, Ahsanullah Unar, Maha A. Thafar, Somayah Albaradei, Apilak Worachartcheewan
{"title":"StackAHTPs: An explainable antihypertensive peptides identifier based on heterogeneous features and stacked learning approach","authors":"Ali Ghulam,&nbsp;Muhammad Arif,&nbsp;Ahsanullah Unar,&nbsp;Maha A. Thafar,&nbsp;Somayah Albaradei,&nbsp;Apilak Worachartcheewan","doi":"10.1049/syb2.70002","DOIUrl":null,"url":null,"abstract":"<p>Hypertension, often known as high blood pressure, is a major concern to millions of individuals globally. Recent studies have demonstrated the significant efficacy of naturally derived peptides in reducing blood pressure. Hypertension is one of the risks associated with cardiovascular disorders and other health problems. Naturally sourced bioactive peptides possessing antihypertensive properties provide considerable potential as viable substitutes for conventional pharmaceutical medications. Currently, thorough examination of antihypertensive peptide (AHTPs), by using traditional wet-lab methods is highly expensive and labours. Therefore, in-silico approaches especially machine-learning (ML) algorithms are favourable due to saving time and cost in the discovery of AHTPs. In this study, a novel ML-based predictor, called StackAHTP was developed for predicting accurate AHTPs from sequence only. The proposed method, utilise two types of feature descriptors Pseudo-Amino Acid Composition and Dipeptide Composition to encode the local and global hidden information from peptide sequences. Furthermore, the encoded features are serially merged and ranked through SHapley Additive explanations (SHAP) algorithm. Then, the top ranked are fed into three different ensemble classifiers (Bagging, Boosting, and Stacking) for enhancing the prediction performance of the model. The StackAHTPs method achieved superior performance compare to other ML classifiers (AdaBoost, XGBoost and Light Gradient Boosting (LightGBM), Bagging and Boosting) on 10-fold cross validation and independent test. The experimental outcomes demonstrate that our proposed method outperformed the existing methods and achieved an accuracy of 92.25% and F1-score of 89.67% on independent test for predicting AHTPs and non-AHTPs. The authors believe this research will remarkably contribute in predicting large-scale characterisation of AHTPs and accelerate the drug discovery process. At https://github.com/ali-ghulam/StackAHTPs you may find datasets features used.</p>","PeriodicalId":50379,"journal":{"name":"IET Systems Biology","volume":"19 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/syb2.70002","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/syb2.70002","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Hypertension, often known as high blood pressure, is a major concern to millions of individuals globally. Recent studies have demonstrated the significant efficacy of naturally derived peptides in reducing blood pressure. Hypertension is one of the risks associated with cardiovascular disorders and other health problems. Naturally sourced bioactive peptides possessing antihypertensive properties provide considerable potential as viable substitutes for conventional pharmaceutical medications. Currently, thorough examination of antihypertensive peptide (AHTPs), by using traditional wet-lab methods is highly expensive and labours. Therefore, in-silico approaches especially machine-learning (ML) algorithms are favourable due to saving time and cost in the discovery of AHTPs. In this study, a novel ML-based predictor, called StackAHTP was developed for predicting accurate AHTPs from sequence only. The proposed method, utilise two types of feature descriptors Pseudo-Amino Acid Composition and Dipeptide Composition to encode the local and global hidden information from peptide sequences. Furthermore, the encoded features are serially merged and ranked through SHapley Additive explanations (SHAP) algorithm. Then, the top ranked are fed into three different ensemble classifiers (Bagging, Boosting, and Stacking) for enhancing the prediction performance of the model. The StackAHTPs method achieved superior performance compare to other ML classifiers (AdaBoost, XGBoost and Light Gradient Boosting (LightGBM), Bagging and Boosting) on 10-fold cross validation and independent test. The experimental outcomes demonstrate that our proposed method outperformed the existing methods and achieved an accuracy of 92.25% and F1-score of 89.67% on independent test for predicting AHTPs and non-AHTPs. The authors believe this research will remarkably contribute in predicting large-scale characterisation of AHTPs and accelerate the drug discovery process. At https://github.com/ali-ghulam/StackAHTPs you may find datasets features used.

Abstract Image

StackAHTPs:基于异构特征和堆叠学习方法的可解释的抗高血压肽标识符
高血压,通常被称为高血压,是全球数百万人关注的主要问题。最近的研究已经证明了天然肽在降低血压方面的显著功效。高血压是与心血管疾病和其他健康问题相关的风险之一。天然来源的具有抗高血压特性的生物活性肽作为传统药物的可行替代品提供了相当大的潜力。目前,使用传统的湿实验室方法对降压肽(AHTPs)进行彻底检查是非常昂贵和费力的。因此,由于节省了发现AHTPs的时间和成本,计算机方法特别是机器学习(ML)算法是有利的。在这项研究中,开发了一种新的基于ml的预测器,称为StackAHTP,用于仅从序列预测准确的ahtp。该方法利用伪氨基酸组成和二肽组成两种特征描述符对肽序列的局部和全局隐藏信息进行编码。此外,通过SHapley加性解释(SHAP)算法对编码特征进行序列合并和排序。然后,将排名最高的分类器馈送到三种不同的集成分类器(Bagging, Boosting和Stacking)中,以增强模型的预测性能。与其他ML分类器(AdaBoost, XGBoost和Light Gradient Boosting (LightGBM), Bagging和Boosting)相比,StackAHTPs方法在10倍交叉验证和独立测试中取得了更好的性能。实验结果表明,该方法在预测AHTPs和非AHTPs的独立测试中准确率为92.25%,f1得分为89.67%,优于现有方法。作者认为,这项研究将显著有助于预测AHTPs的大规模特征,并加速药物发现过程。在https://github.com/ali-ghulam/StackAHTPs您可以找到使用的数据集功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IET Systems Biology
IET Systems Biology 生物-数学与计算生物学
CiteScore
4.20
自引率
4.30%
发文量
17
审稿时长
>12 weeks
期刊介绍: IET Systems Biology covers intra- and inter-cellular dynamics, using systems- and signal-oriented approaches. Papers that analyse genomic data in order to identify variables and basic relationships between them are considered if the results provide a basis for mathematical modelling and simulation of cellular dynamics. Manuscripts on molecular and cell biological studies are encouraged if the aim is a systems approach to dynamic interactions within and between cells. The scope includes the following topics: Genomics, transcriptomics, proteomics, metabolomics, cells, tissue and the physiome; molecular and cellular interaction, gene, cell and protein function; networks and pathways; metabolism and cell signalling; dynamics, regulation and control; systems, signals, and information; experimental data analysis; mathematical modelling, simulation and theoretical analysis; biological modelling, simulation, prediction and control; methodologies, databases, tools and algorithms for modelling and simulation; modelling, analysis and control of biological networks; synthetic biology and bioengineering based on systems biology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信