{"title":"Stack-HDAC3i: A high-precision identification of HDAC3 inhibitors by exploiting a stacked ensemble-learning framework","authors":"Watshara Shoombuatong , Ittipat Meewan , Lawankorn Mookdarsanit , Nalini Schaduangrat","doi":"10.1016/j.ymeth.2024.08.003","DOIUrl":"10.1016/j.ymeth.2024.08.003","url":null,"abstract":"<div><p>Epigenetics involves reversible modifications in gene expression without altering the genetic code itself. Among these modifications, histone deacetylases (HDACs) play a key role by removing acetyl groups from lysine residues on histones. Overexpression of HDACs is linked to the proliferation and survival of tumor cells. To combat this, HDAC inhibitors (HDACi) are commonly used in cancer treatments. However, pan-HDAC inhibition can lead to numerous side effects. Therefore, isoform-selective HDAC inhibitors, such as HDAC3i, could be advantageous for treating various medical conditions while minimizing off-target effects. To date, computational approaches that use only the SMILES notation without any experimental evidence have become increasingly popular and necessary for the initial discovery of novel potential therapeutic drugs. In this study, we develop an innovative and high-precision stacked-ensemble framework, called Stack-HDAC3i, which can directly identify HDAC3i using only the SMILES notation. Using an up-to-date benchmark dataset, we first employed both molecular descriptors and Mol2Vec embeddings to generate feature representations that cover multi-view information embedded in HDAC3i, such as structural and contextual information. Subsequently, these feature representations were used to train baseline models using nine popular ML algorithms. Finally, the probabilistic features derived from the selected baseline models were fused to construct the final stacked model. Both cross-validation and independent tests showed that Stack-HDAC3i is a high-accuracy prediction model with great generalization ability for identifying HDAC3i. Furthermore, in the independent test, Stack-HDAC3i achieved an accuracy of 0.926 and Matthew’s correlation coefficient of 0.850, which are 0.44–6.11% and 0.83–11.90% higher than its constituent baseline models, respectively.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 147-157"},"PeriodicalIF":4.2,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142078708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MethodsPub Date : 2024-08-22DOI: 10.1016/j.ymeth.2024.08.001
Muhammad Arif , Saleh Musleh , Ali Ghulam , Huma Fida , Yasser Alqahtani , Tanvir Alam
{"title":"StackDPPred: Multiclass prediction of defensin peptides using stacked ensemble learning with optimized features","authors":"Muhammad Arif , Saleh Musleh , Ali Ghulam , Huma Fida , Yasser Alqahtani , Tanvir Alam","doi":"10.1016/j.ymeth.2024.08.001","DOIUrl":"10.1016/j.ymeth.2024.08.001","url":null,"abstract":"<div><p>Host defense or antimicrobial peptides (AMPs) are promising candidates for protecting host against microbial pathogens for example bacteria, virus, fungi, yeast. Defensins are the type of AMPs that act as potential therapeutic drug agent and perform vital role in various biological process. Conventional Experiments to identify defensin peptides (DPs) are time consuming and expensive. Thus, the shortcomings of wet lab experiments are leveraged by computational methods to accurately predict the functional types of DPs. In this paper, we aim to propose a novel multi-class ensemble-based prediction model called StackDPPred for identifying the properties of DPs. The peptide sequences are encoded using split amino acid composition (SAAC), segmented position specific scoring matrix (SegPSSM), histogram of oriented gradients-based PSSM (HOGPSSM) and feature extraction based graphical and statistical (FEGS) descriptors. Next, principal component analysis (PCA) is used to select the best subset of attributes. After that, the optimized features are fed into single machine learning and stacking-based ensemble classifiers. Furthermore, the ablation study demonstrates the robustness and efficacy of the stacking approach using reduced features for predicting DPs and their families. The proposed StackDPPred method improves the overall accuracy by 13.41% and 7.62% compared to existing DPs predictors iDPF-PseRAAC and iDEF-PseRAAC, respectively on validation test. Additionally, we applied the local interpretable model-agnostic explanations (LIME) algorithm to understand the contribution of selected features to the overall prediction. We believe, StackDPPred could serve as a valuable tool accelerating the screening of large-scale DPs and peptide-based drug discovery process.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 129-139"},"PeriodicalIF":4.2,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1046202324001828/pdfft?md5=315d0a8005d4827680fb3f30ae38db5c&pid=1-s2.0-S1046202324001828-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142034770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MethodsPub Date : 2024-08-22DOI: 10.1016/j.ymeth.2024.08.004
Yong Li , Ru Gao , Shan Liu , Hongqi Zhang , Hao Lv , Hongyan Lai
{"title":"PhosBERT: A self-supervised learning model for identifying phosphorylation sites in SARS-CoV-2-infected human cells","authors":"Yong Li , Ru Gao , Shan Liu , Hongqi Zhang , Hao Lv , Hongyan Lai","doi":"10.1016/j.ymeth.2024.08.004","DOIUrl":"10.1016/j.ymeth.2024.08.004","url":null,"abstract":"<div><p>Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded RNA virus, which mainly causes respiratory and enteric diseases and is responsible for the outbreak of coronavirus disease 19 (COVID-19). Numerous studies have demonstrated that SARS-CoV-2 infection will lead to a significant dysregulation of protein post-translational modification profile in human cells. The accurate recognition of phosphorylation sites in host cells will contribute to a deep understanding of the pathogenic mechanisms of SARS-CoV-2 and also help to screen drugs and compounds with antiviral potential. Therefore, there is a need to develop cost-effective and high-precision computational strategies for specifically identifying SARS-CoV-2-infected phosphorylation sites. In this work, we first implemented a custom neural network model (named PhosBERT) on the basis of a pre-trained protein language model of ProtBert, which was a self-supervised learning approach developed on the Bidirectional Encoder Representation from Transformers (BERT) architecture. PhosBERT was then trained and validated on serine (S) and threonine (T) phosphorylation dataset and tyrosine (Y) phosphorylation dataset with 5-fold cross-validation, respectively. Independent validation results showed that PhosBERT could identify S/T phosphorylation sites with high accuracy and <em>AUC</em> (area under the receiver operating characteristic) value of 81.9% and 0.896. The prediction accuracy and <em>AUC</em> value of Y phosphorylation sites reached up to 87.1% and 0.902. It indicated that the proposed model was of good prediction ability and stability and would provide a new approach for studying SARS-CoV-2 phosphorylation sites.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 140-146"},"PeriodicalIF":4.2,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142046093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MethodsPub Date : 2024-08-21DOI: 10.1016/j.ymeth.2024.08.002
Leyi Wei
{"title":"Advanced deep learning approaches enable high-throughput biological and biomedicine data analysis","authors":"Leyi Wei","doi":"10.1016/j.ymeth.2024.08.002","DOIUrl":"10.1016/j.ymeth.2024.08.002","url":null,"abstract":"","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 116-118"},"PeriodicalIF":4.2,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141999174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MethodsPub Date : 2024-08-19DOI: 10.1016/j.ymeth.2024.08.005
Aqsa Amjad , Saeed Ahmed , Muhammad Kabir , Muhammad Arif , Tanvir Alam
{"title":"A novel deep learning identifier for promoters and their strength using heterogeneous features","authors":"Aqsa Amjad , Saeed Ahmed , Muhammad Kabir , Muhammad Arif , Tanvir Alam","doi":"10.1016/j.ymeth.2024.08.005","DOIUrl":"10.1016/j.ymeth.2024.08.005","url":null,"abstract":"<div><p>Promoters, which are short (50–1500 base-pair) in DNA regions, have emerged to play a critical role in the regulation of gene transcription. Numerous dangerous diseases, likewise cancer, cardiovascular, and inflammatory bowel diseases, are caused by genetic variations in promoters. Consequently, the correct identification and characterization of promoters are significant for the discovery of drugs. However, experimental approaches to recognizing promoters and their strengths are challenging in terms of cost, time, and resources. Therefore, computational techniques are highly desirable for the correct characterization of promoters from unannotated genomic data. Here, we designed a powerful bi-layer deep-learning based predictor named “PROCABLES“, which discriminates DNA samples as promoters in the first-phase and strong or weak promoters in the second-phase respectively. The proposed method utilizes five distinct features, such as word2vec, k-spaced nucleotide pairs, trinucleotide propensity-based features, trinucleotide composition, and electron–ion interaction pseudopotentials, to extract the hidden patterns from the DNA sequence. Afterwards, a stacked framework is formed by integrating a convolutional neural network (CNN) with bidirectional long-short-term memory (LSTM) using multi-view attributes to train the proposed model. The PROCABLES model achieved an accuracy of 0.971 and 0.920 and the MCC 0.940 and 0.840 for the first and second-layer using the ten-fold cross-validation test, respectively. The predicted results anticipate that the proposed PROCABLES protocol outperformed the advanced computational predictors targeting promoters and their types. In summary, this research will provide useful hints for the recognition of large-scale promoters in particular and other DNA problems in general.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 119-128"},"PeriodicalIF":4.2,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1046202324001853/pdfft?md5=4c4374f8b06a9c662b2af0a84d0208ad&pid=1-s2.0-S1046202324001853-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142015945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MethodsPub Date : 2024-08-06DOI: 10.1016/j.ymeth.2024.07.012
Ke Han , Jianchun Wang , Ying Chu , Qian Liao , Yijie Ding , Dequan Zheng , Jie Wan , Xiaoyi Guo , Quan Zou
{"title":"Deep learning based method for predicting DNA N6-methyladenosine sites","authors":"Ke Han , Jianchun Wang , Ying Chu , Qian Liao , Yijie Ding , Dequan Zheng , Jie Wan , Xiaoyi Guo , Quan Zou","doi":"10.1016/j.ymeth.2024.07.012","DOIUrl":"10.1016/j.ymeth.2024.07.012","url":null,"abstract":"<div><p>DNA N6 methyladenine (6mA) plays an important role in many biological processes, and accurately identifying its sites helps one to understand its biological effects more comprehensively. Previous traditional experimental methods are very labor-intensive and traditional machine learning methods also seem to be somewhat insufficient as the database of 6mA methylation groups becomes progressively larger, so we propose a deep learning-based method called multi-scale convolutional model based on global response normalization (CG6mA) to solve the prediction problem of 6mA site. This method is tested with other methods on three different kinds of benchmark datasets, and the results show that our model can get more excellent prediction results.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 91-98"},"PeriodicalIF":4.2,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Disease trend analysis platform accurately predicts the occurrence of cervical cancer under mixed diseases","authors":"Yuchao Liang , Yuting Guo , Yifei Zhai , Jian Zhou , Wuritu Yang , Yongchun Zuo","doi":"10.1016/j.ymeth.2024.07.011","DOIUrl":"10.1016/j.ymeth.2024.07.011","url":null,"abstract":"<div><p>Cervical cancer (CC) is one of the most common gynecological malignancies. Cytological screening, while being the most common and accurate method for detecting cervical cancer, is both time-consuming and costly. Predicting CC based on bioinformatics can assist in the rapid early screening of CC in clinical practice. Most recent CC prediction methods require a large amount of detection data or sequencing data and are not ideal for CC detection in complex disease samples. We developed the Disease trend analysis platform (Dtap), which can quickly predict the occurrence of diseases using only blood routine data. Blood routine data was collected from 1,292 cervical cancer patients, 4,860 patients with complex diseases, and 4,980 healthy individuals from various sources. The results show that the Dtap-based trend model maintained good and stable performance in the prediction task of multiple datasets as well as complex disease samples. Finally, we built DTAPCC (<span><span>http://bioinfor.imu.edu.cn/dtapcc</span><svg><path></path></svg></span>), a Dtap-based CC disease prediction platform, to help users quickly predict CC and visualize trend features.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 108-115"},"PeriodicalIF":4.2,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141900435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MethodsPub Date : 2024-08-02DOI: 10.1016/j.ymeth.2024.07.007
Longfei Luo, Zhuokun Tan, Shunfang Wang
{"title":"RSANMDA: Resampling based subview attention network for miRNA-disease association prediction","authors":"Longfei Luo, Zhuokun Tan, Shunfang Wang","doi":"10.1016/j.ymeth.2024.07.007","DOIUrl":"10.1016/j.ymeth.2024.07.007","url":null,"abstract":"<div><p>Many studies have demonstrated the importance of accurately identifying miRNA-disease associations (MDAs) for understanding disease mechanisms. However, the number of known MDAs is significantly fewer than the unknown pairs. Here, we propose RSANMDA, a subview attention network for predicting MDAs. We first extract miRNA and disease features from multiple similarity matrices. Next, using resampling techniques, we generate different subviews from known MDAs. Each subview undergoes multi-head graph attention to capture its features, followed by semantic attention to integrate features across subviews. Finally, combining raw and training features, we use a multilayer scoring perceptron for prediction. In the experimental section, we conducted comparative experiments with other advanced models on both HMDD v2.0 and HMDD v3.2 datasets. We also performed a series of ablation studies and parameter tuning exercises. Comprehensive experiments conclusively demonstrate the superiority of our model. Case studies on lung, breast, and esophageal cancers further validate our method's predictive capability for identifying disease-related miRNAs.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 99-107"},"PeriodicalIF":4.2,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MethodsPub Date : 2024-08-02DOI: 10.1016/j.ymeth.2024.07.013
Brian K. McFarlin , Elizabeth A. Bridgeman , John H. Curtis , Jakob L. Vingren , David W. Hill
{"title":"Baker’s yeast beta glucan supplementation was associated with an improved innate immune mRNA expression response after exercise","authors":"Brian K. McFarlin , Elizabeth A. Bridgeman , John H. Curtis , Jakob L. Vingren , David W. Hill","doi":"10.1016/j.ymeth.2024.07.013","DOIUrl":"10.1016/j.ymeth.2024.07.013","url":null,"abstract":"<div><p>Beta glucans are found in many natural sources, however, only Baker’s Yeast Beta Glucan (BYBG) has been well documented to have structure–function effects that are associated with improved innate immune response to stressors (e.g., exercise, infection, etc.). The purpose was to identify a BYBG-associated mRNA expression pattern following exercise. Participants gave IRB-approved consent and were randomized to BYBG (Wellmune®; N=9) or Placebo (maltodextrin; N=10) for 6-weeks prior to performing 90 min of whole-body exercise. Paxgene blood samples were collected prior to exercise (PRE), after exercise (POST), two hours after exercise (2H), and four hours after exercise (4H). Total RNA was isolated and analyzed for the expression of 770 innate immune response mRNA (730 mRNA targets; 40 housekeepers/controls; Nanostring nCounter). The raw data were normalized against housekeeping controls and expressed as Log<sub>2</sub> fold change from PRE for a given condition. Significance was set at p < 0.05 with adjustments for multiple comparisons and false discovery rate. We identified 47 mRNA whose expression was changed after exercise with BYBG and classified them to four functional pathways: 1) Immune Cell Maturation (8 mRNA), 2) Immune Response and Function (5 mRNA), 3) Pattern Recognition Receptors and DAMP or PAMP Detection (25 mRNA), and 4) Detection and Resolution of Tissue Damage (9 mRNA). The identified mRNA whose expression was altered after exercise with BYBG may represent an innate immune response pattern and supports previous conclusions that BYBG improves immune response to a future sterile inflammation or infection.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 68-79"},"PeriodicalIF":4.2,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1046202324001816/pdfft?md5=4fc8423bf5cf947185f14dd5609f3c1b&pid=1-s2.0-S1046202324001816-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141888037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MethodsPub Date : 2024-07-31DOI: 10.1016/j.ymeth.2024.07.008
Sharaf J. Malebary , Nashwan Alromema , Muhammad Taseer Suleman , Maham Saleem
{"title":"m5c-iDeep: 5-Methylcytosine sites identification through deep learning","authors":"Sharaf J. Malebary , Nashwan Alromema , Muhammad Taseer Suleman , Maham Saleem","doi":"10.1016/j.ymeth.2024.07.008","DOIUrl":"10.1016/j.ymeth.2024.07.008","url":null,"abstract":"<div><p>5-Methylcytosine (m5c) is a modified cytosine base which is formed as the result of addition of methyl group added at position 5 of carbon. This modification is one of the most common PTM that used to occur in almost all types of RNA. The conventional laboratory methods do not provide quick reliable identification of m5c sites. However, the sequence data readiness has made it feasible to develop computationally intelligent models that optimize the identification process for accuracy and robustness. The present research focused on the development of in-silico methods built using deep learning models. The encoded data was then fed into deep learning models, which included gated recurrent unit (GRU), long short-term memory (LSTM), and bi-directional LSTM (Bi-LSTM). After that, the models were subjected to a rigorous evaluation process that included both independent set testing and 10-fold cross validation. The results revealed that LSTM-based model, m5c-iDeep, outperformed revealing 99.9 % accuracy while comparing with existing m5c predictors. In order to facilitate researchers, m5c-iDeep was also deployed on a web-based server which is accessible at <span><span>https://taseersuleman-m5c-ideep-m5c-ideep.streamlit.app/</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":390,"journal":{"name":"Methods","volume":"230 ","pages":"Pages 80-90"},"PeriodicalIF":4.2,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141873837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}