AIPs-DeepEnC-GA: Predicting anti-inflammatory peptides using embedded evolutionary and sequential feature integration with genetic algorithm based deep ensemble model
Ali Raza , Jamal Uddin , Quan Zou , Shahid Akbar , Wajdi Alghamdi , Ruijun Liu
{"title":"AIPs-DeepEnC-GA: Predicting anti-inflammatory peptides using embedded evolutionary and sequential feature integration with genetic algorithm based deep ensemble model","authors":"Ali Raza , Jamal Uddin , Quan Zou , Shahid Akbar , Wajdi Alghamdi , Ruijun Liu","doi":"10.1016/j.chemolab.2024.105239","DOIUrl":null,"url":null,"abstract":"<div><div>Inflammation is a biological response to harmful stimuli including infections, damaged cells, tissue injuries, and toxic chemicals. It plays an essential role in facilitating tissue repair by eliminating pathogenic microorganisms. Currently, numerous therapies are applied to treat autoimmune and inflammatory diseases. However, these conventional anti-inflammatory medications are often labor-intensive, costly, and associated with adverse side effects. Recently, researchers have identified anti-inflammatory peptides (AIPs) as a cost-effective alternative for treating several inflammatory diseases, due to their high selectivity for target cells with minimal side effects. In this study, we introduce a novel computational predictor, AIPs-DeepEnC-GA, developed to accurately predict AIP samples. The training sequences are encoded using a novel n-spaced dipeptide-based position-specific scoring matrix (NsDP-PSSM) and Pseudo position-specific scoring matrix (PsePSSM)-based embedded evolutionary features. Additionally, the reduced-amino acid alphabet (RAAA-11), and composite Physiochemical properties (CPP) are employed to capture cluster-physiochemical properties based on structural information. A hybrid feature strategy is then applied, integrating embedded evolutionary features, CPP and RAAA-11 descriptors to overcome the limitations of individual encoding methods. Minimum redundancy and maximum relevance (mRMR) is utilized to select the optimal features. The selected features are trained using four different deep-learning models. The predictive labels generated by these models are provided to a genetic algorithm to form a deep-ensemble training model. The proposed AIPs-DeepEnC-GA model achieved a ∼15 % increase in predictive accuracy, reaching 94.39 %, and a 19 % improvement in the area under the curve (AUC), achieving a value of 0.98 using training sequences. For independent datasets, our method obtained improved accuracies of 91.87 %, and 89.21 %, with AUC values of 0.94 and 0.92 for Ind-I, and Ind-II, respectively. Our proposed AIPs-DeepEnC-GA model demonstrates an 11 % improvement in predictive accuracy over existing AIPs computational models using training samples. The efficacy and reliability of this model make it a promising tool for both in drug development and research academia.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105239"},"PeriodicalIF":3.7000,"publicationDate":"2024-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743924001795","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Inflammation is a biological response to harmful stimuli including infections, damaged cells, tissue injuries, and toxic chemicals. It plays an essential role in facilitating tissue repair by eliminating pathogenic microorganisms. Currently, numerous therapies are applied to treat autoimmune and inflammatory diseases. However, these conventional anti-inflammatory medications are often labor-intensive, costly, and associated with adverse side effects. Recently, researchers have identified anti-inflammatory peptides (AIPs) as a cost-effective alternative for treating several inflammatory diseases, due to their high selectivity for target cells with minimal side effects. In this study, we introduce a novel computational predictor, AIPs-DeepEnC-GA, developed to accurately predict AIP samples. The training sequences are encoded using a novel n-spaced dipeptide-based position-specific scoring matrix (NsDP-PSSM) and Pseudo position-specific scoring matrix (PsePSSM)-based embedded evolutionary features. Additionally, the reduced-amino acid alphabet (RAAA-11), and composite Physiochemical properties (CPP) are employed to capture cluster-physiochemical properties based on structural information. A hybrid feature strategy is then applied, integrating embedded evolutionary features, CPP and RAAA-11 descriptors to overcome the limitations of individual encoding methods. Minimum redundancy and maximum relevance (mRMR) is utilized to select the optimal features. The selected features are trained using four different deep-learning models. The predictive labels generated by these models are provided to a genetic algorithm to form a deep-ensemble training model. The proposed AIPs-DeepEnC-GA model achieved a ∼15 % increase in predictive accuracy, reaching 94.39 %, and a 19 % improvement in the area under the curve (AUC), achieving a value of 0.98 using training sequences. For independent datasets, our method obtained improved accuracies of 91.87 %, and 89.21 %, with AUC values of 0.94 and 0.92 for Ind-I, and Ind-II, respectively. Our proposed AIPs-DeepEnC-GA model demonstrates an 11 % improvement in predictive accuracy over existing AIPs computational models using training samples. The efficacy and reliability of this model make it a promising tool for both in drug development and research academia.
期刊介绍:
Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines.
Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data.
The journal deals with the following topics:
1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.)
2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered.
3) Development of new software that provides novel tools or truly advances the use of chemometrical methods.
4) Well characterized data sets to test performance for the new methods and software.
The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.