Ali Ghulam, Muhammad Arif, Ahsanullah Unar, Maha A. Thafar, Somayah Albaradei, Apilak Worachartcheewan
{"title":"StackAHTPs: An explainable antihypertensive peptides identifier based on heterogeneous features and stacked learning approach","authors":"Ali Ghulam, Muhammad Arif, Ahsanullah Unar, Maha A. Thafar, Somayah Albaradei, Apilak Worachartcheewan","doi":"10.1049/syb2.70002","DOIUrl":null,"url":null,"abstract":"<p>Hypertension, often known as high blood pressure, is a major concern to millions of individuals globally. Recent studies have demonstrated the significant efficacy of naturally derived peptides in reducing blood pressure. Hypertension is one of the risks associated with cardiovascular disorders and other health problems. Naturally sourced bioactive peptides possessing antihypertensive properties provide considerable potential as viable substitutes for conventional pharmaceutical medications. Currently, thorough examination of antihypertensive peptide (AHTPs), by using traditional wet-lab methods is highly expensive and labours. Therefore, in-silico approaches especially machine-learning (ML) algorithms are favourable due to saving time and cost in the discovery of AHTPs. In this study, a novel ML-based predictor, called StackAHTP was developed for predicting accurate AHTPs from sequence only. The proposed method, utilise two types of feature descriptors Pseudo-Amino Acid Composition and Dipeptide Composition to encode the local and global hidden information from peptide sequences. Furthermore, the encoded features are serially merged and ranked through SHapley Additive explanations (SHAP) algorithm. Then, the top ranked are fed into three different ensemble classifiers (Bagging, Boosting, and Stacking) for enhancing the prediction performance of the model. The StackAHTPs method achieved superior performance compare to other ML classifiers (AdaBoost, XGBoost and Light Gradient Boosting (LightGBM), Bagging and Boosting) on 10-fold cross validation and independent test. The experimental outcomes demonstrate that our proposed method outperformed the existing methods and achieved an accuracy of 92.25% and F1-score of 89.67% on independent test for predicting AHTPs and non-AHTPs. The authors believe this research will remarkably contribute in predicting large-scale characterisation of AHTPs and accelerate the drug discovery process. At https://github.com/ali-ghulam/StackAHTPs you may find datasets features used.</p>","PeriodicalId":50379,"journal":{"name":"IET Systems Biology","volume":"19 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/syb2.70002","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Systems Biology","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/syb2.70002","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Hypertension, often known as high blood pressure, is a major concern to millions of individuals globally. Recent studies have demonstrated the significant efficacy of naturally derived peptides in reducing blood pressure. Hypertension is one of the risks associated with cardiovascular disorders and other health problems. Naturally sourced bioactive peptides possessing antihypertensive properties provide considerable potential as viable substitutes for conventional pharmaceutical medications. Currently, thorough examination of antihypertensive peptide (AHTPs), by using traditional wet-lab methods is highly expensive and labours. Therefore, in-silico approaches especially machine-learning (ML) algorithms are favourable due to saving time and cost in the discovery of AHTPs. In this study, a novel ML-based predictor, called StackAHTP was developed for predicting accurate AHTPs from sequence only. The proposed method, utilise two types of feature descriptors Pseudo-Amino Acid Composition and Dipeptide Composition to encode the local and global hidden information from peptide sequences. Furthermore, the encoded features are serially merged and ranked through SHapley Additive explanations (SHAP) algorithm. Then, the top ranked are fed into three different ensemble classifiers (Bagging, Boosting, and Stacking) for enhancing the prediction performance of the model. The StackAHTPs method achieved superior performance compare to other ML classifiers (AdaBoost, XGBoost and Light Gradient Boosting (LightGBM), Bagging and Boosting) on 10-fold cross validation and independent test. The experimental outcomes demonstrate that our proposed method outperformed the existing methods and achieved an accuracy of 92.25% and F1-score of 89.67% on independent test for predicting AHTPs and non-AHTPs. The authors believe this research will remarkably contribute in predicting large-scale characterisation of AHTPs and accelerate the drug discovery process. At https://github.com/ali-ghulam/StackAHTPs you may find datasets features used.
期刊介绍:
IET Systems Biology covers intra- and inter-cellular dynamics, using systems- and signal-oriented approaches. Papers that analyse genomic data in order to identify variables and basic relationships between them are considered if the results provide a basis for mathematical modelling and simulation of cellular dynamics. Manuscripts on molecular and cell biological studies are encouraged if the aim is a systems approach to dynamic interactions within and between cells.
The scope includes the following topics:
Genomics, transcriptomics, proteomics, metabolomics, cells, tissue and the physiome; molecular and cellular interaction, gene, cell and protein function; networks and pathways; metabolism and cell signalling; dynamics, regulation and control; systems, signals, and information; experimental data analysis; mathematical modelling, simulation and theoretical analysis; biological modelling, simulation, prediction and control; methodologies, databases, tools and algorithms for modelling and simulation; modelling, analysis and control of biological networks; synthetic biology and bioengineering based on systems biology.