{"title":"A breast cancer-specific combinational QSAR model development using machine learning and deep learning approaches.","authors":"Anush Karampuri, Shyam Perugu","doi":"10.3389/fbinf.2023.1328262","DOIUrl":"10.3389/fbinf.2023.1328262","url":null,"abstract":"<p><p>Breast cancer is the most prevalent and heterogeneous form of cancer affecting women worldwide. Various therapeutic strategies are in practice based on the extent of disease spread, such as surgery, chemotherapy, radiotherapy, and immunotherapy. Combinational therapy is another strategy that has proven to be effective in controlling cancer progression. Administration of Anchor drug, a well-established primary therapeutic agent with known efficacy for specific targets, with Library drug, a supplementary drug to enhance the efficacy of anchor drugs and broaden the therapeutic approach. Our work focused on harnessing regression-based Machine learning (ML) and deep learning (DL) algorithms to develop a structure-activity relationship between the molecular descriptors of drug pairs and their combined biological activity through a QSAR (Quantitative structure-activity relationship) model. 11 popularly known machine learning and deep learning algorithms were used to develop QSAR models. A total of 52 breast cancer cell lines, 25 anchor drugs, and 51 library drugs were considered in developing the QSAR model. It was observed that Deep Neural Networks (DNNs) achieved an impressive R<sup>2</sup> (Coefficient of Determination) of 0.94, with an RMSE (Root Mean Square Error) value of 0.255, making it the most effective algorithm for developing a structure-activity relationship with strong generalization capabilities. In conclusion, applying combinational therapy alongside ML and DL techniques represents a promising approach to combating breast cancer.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1328262"},"PeriodicalIF":2.8,"publicationDate":"2024-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10822965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139577087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BPAGS: a web application for bacteriocin prediction via feature evaluation using alternating decision tree, genetic algorithm, and linear support vector classifier","authors":"Suraiya Akhter, John H. Miller","doi":"10.3389/fbinf.2023.1284705","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1284705","url":null,"abstract":"The use of bacteriocins has emerged as a propitious strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Hence, a compelling requirement arises for a precise and efficient computational model that can accurately predict novel bacteriocins. Machine learning’s ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. A web application for predicting bacteriocin was created in this study, utilizing a machine learning approach. The feature sets employed in the application were chosen using alternating decision tree (ADTree), genetic algorithm (GA), and linear support vector classifier (linear SVC)-based feature evaluation methods. Initially, potential features were extracted from the physicochemical, structural, and sequence-profile attributes of both bacteriocin and non-bacteriocin protein sequences. We assessed the candidate features first using the Pearson correlation coefficient, followed by separate evaluations with ADTree, GA, and linear SVC to eliminate unnecessary features. Finally, we constructed random forest (RF), support vector machine (SVM), decision tree (DT), logistic regression (LR), k-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB) models using reduced feature sets. We obtained the overall top performing model using SVM with ADTree-reduced features, achieving an accuracy of 99.11% and an AUC value of 0.9984 on the testing dataset. We also assessed the predictive capabilities of our best-performing models for each reduced feature set relative to our previously developed software solution, a sequence alignment-based tool, and a deep-learning approach. A web application, titled BPAGS (Bacteriocin Prediction based on ADTree, GA, and linear SVC), was developed to incorporate the predictive models built using ADTree, GA, and linear SVC-based feature sets. Currently, the web-based tool provides classification results with associated probability values and has options to add new samples in the training data to improve the predictive efficacy. BPAGS is freely accessible at https://shiny.tricities.wsu.edu/bacteriocin-prediction/.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"8 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139439460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"No-boundary thinking for artificial intelligence in bioinformatics and education","authors":"Prajay Patel, Nisha Pillai, Inimary T. Toby","doi":"10.3389/fbinf.2023.1332902","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1332902","url":null,"abstract":"No-boundary thinking enables the scientific community to reflect in a thoughtful manner and discover new opportunities, create innovative solutions, and break through barriers that might have otherwise constrained their progress. This concept encourages thinking without being confined by traditional rules, limitations, or established norms, and a mindset that is not limited by previous work, leading to fresh perspectives and innovative outcomes. So, where do we see the field of artificial intelligence (AI) in bioinformatics going in the next 30 years? That was the theme of a “No-Boundary Thinking” Session as part of the Mid-South Computational Bioinformatics Society’s (MCBIOS) 19th annual meeting in Irving, Texas. This session addressed various areas of AI in an open discussion and raised some perspectives on how popular tools like ChatGPT can be integrated into bioinformatics, communicating with scientists in different fields to properly utilize the potential of these algorithms, and how to continue educational outreach to further interest of data science and informatics to the next-generation of scientists.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"49 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139448061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patricia Soto, Davis T. Thalhuber, Frank Luceri, Jamie Janos, Mason R. Borgman, Noah M. Greenwood, Sofia Acosta, Hunter Stoffel
{"title":"Protein-lipid interactions and protein anchoring modulate the modes of association of the globular domain of the Prion protein and Doppel protein to model membrane patches","authors":"Patricia Soto, Davis T. Thalhuber, Frank Luceri, Jamie Janos, Mason R. Borgman, Noah M. Greenwood, Sofia Acosta, Hunter Stoffel","doi":"10.3389/fbinf.2023.1321287","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1321287","url":null,"abstract":"The Prion protein is the molecular hallmark of the incurable prion diseases affecting mammals, including humans. The protein-only hypothesis states that the misfolding, accumulation, and deposition of the Prion protein play a critical role in toxicity. The cellular Prion protein (PrPC) anchors to the extracellular leaflet of the plasma membrane and prefers cholesterol- and sphingomyelin-rich membrane domains. Conformational Prion protein conversion into the pathological isoform happens on the cell surface. In vitro and in vivo experiments indicate that Prion protein misfolding, aggregation, and toxicity are sensitive to the lipid composition of plasma membranes and vesicles. A picture of the underlying biophysical driving forces that explain the effect of Prion protein - lipid interactions in physiological conditions is needed to develop a structural model of Prion protein conformational conversion. To this end, we use molecular dynamics simulations that mimic the interactions between the globular domain of PrPC anchored to model membrane patches. In addition, we also simulate the Doppel protein anchored to such membrane patches. The Doppel protein is the closest in the phylogenetic tree to PrPC, localizes in an extracellular milieu similar to that of PrPC, and exhibits a similar topology to PrPC even if the amino acid sequence is only 25% identical. Our simulations show that specific protein-lipid interactions and conformational constraints imposed by GPI anchoring together favor specific binding sites in globular PrPC but not in Doppel. Interestingly, the binding sites we found in PrPC correspond to prion protein loops, which are critical in aggregation and prion disease transmission barrier (β2-α2 loop) and in initial spontaneous misfolding (α2-α3 loop). We also found that the membrane re-arranges locally to accommodate protein residues inserted in the membrane surface as a response to protein binding.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"39 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139381635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huyen Le, Ru Chen, Stephen Harris, Hong Fang, Beverly Lyn-Cook, H. Hong, W. Ge, Paul Rogers, Weida Tong, Wen Zou
{"title":"RxNorm for drug name normalization: a case study of prescription opioids in the FDA adverse events reporting system","authors":"Huyen Le, Ru Chen, Stephen Harris, Hong Fang, Beverly Lyn-Cook, H. Hong, W. Ge, Paul Rogers, Weida Tong, Wen Zou","doi":"10.3389/fbinf.2023.1328613","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1328613","url":null,"abstract":"Numerous studies have been conducted on the US Food and Drug Administration (FDA) Adverse Events Reporting System (FAERS) database to assess post-marketing reporting rates for drug safety review and risk assessment. However, the drug names in the adverse event (AE) reports from FAERS were heterogeneous due to a lack of uniformity of information submitted mandatorily by pharmaceutical companies and voluntarily by patients, healthcare professionals, and the public. Studies using FAERS and other spontaneous reporting AEs database without drug name normalization may encounter incomplete collection of AE reports from non-standard drug names and the accuracies of the results might be impacted. In this study, we demonstrated applicability of RxNorm, developed by the National Library of Medicine, for drug name normalization in FAERS. Using prescription opioids as a case study, we used RxNorm application program interface (API) to map all FDA-approved prescription opioids described in FAERS AE reports to their equivalent RxNorm Concept Unique Identifiers (RxCUIs) and RxNorm names. The different names of the opioids were then extracted, and their usage frequencies were calculated in collection of more than 14.9 million AE reports for 13 FDA-approved prescription opioid classes, reported over 17 years. The results showed that a significant number of different names were consistently used for opioids in FAERS reports, with 2,086 different names (out of 7,892) used at least three times and 842 different names used at least ten times for each of the 92 RxNorm names of FDA-approved opioids. Our method of using RxNorm API mapping was confirmed to be efficient and accurate and capable of reducing the heterogeneity of prescription opioid names significantly in the AE reports in FAERS; meanwhile, it is expected to have a broad application to different sets of drug names from any database where drug names are diverse and unnormalized. It is expected to be able to automatically standardize and link different representations of the same drugs to build an intact and high-quality database for diverse research, particularly postmarketing data analysis in pharmacovigilance initiatives.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"48 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139383606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editorial: Expert opinions in protein bioinformatics: 2022","authors":"Daisuke Kihara","doi":"10.3389/fbinf.2023.1338560","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1338560","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"50 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139383582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yixuan Ye, Jiaqi Hu, Fuyuan Pang, Can Cui, Hongyu Zhao
{"title":"Genomic risk prediction of cardiovascular diseases among type 2 diabetes patients in the UK Biobank","authors":"Yixuan Ye, Jiaqi Hu, Fuyuan Pang, Can Cui, Hongyu Zhao","doi":"10.3389/fbinf.2023.1320748","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1320748","url":null,"abstract":"Background: Polygenic risk score (PRS) has proved useful in predicting the risk of cardiovascular diseases (CVD) based on the genotypes of an individual, but most analyses have focused on disease onset in the general population. The usefulness of PRS to predict CVD risk among type 2 diabetes (T2D) patients remains unclear.Methods: We built a meta-PRSCVD upon the candidate PRSs developed from state-of-the-art PRS methods for three CVD subtypes of significant importance: coronary artery disease (CAD), ischemic stroke (IS), and heart failure (HF). To evaluate the prediction performance of the meta-PRSCVD, we restricted our analysis to 21,092 white British T2D patients in the UK Biobank, among which 4,015 had CVD events.Results: Results showed that the meta-PRSCVD was significantly associated with CVD risk with a hazard ratio per standard deviation increase of 1.28 (95% CI: 1.23–1.33). The meta-PRSCVD alone predicted the CVD incidence with an area under the receiver operating characteristic curve (AUC) of 0.57 (95% CI: 0.54–0.59). When restricted to the early-onset patients (onset age ≤ 55), the AUC was further increased to 0.61 (95% CI 0.56–0.67).Conclusion: Our results highlight the potential role of genomic screening for secondary preventions of CVD among T2D patients, especially among early-onset patients.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"59 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139384606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Colin Farrell, Chanyue Hu, Kalsuda Lapborisuth, Kyle Pu, S. Snir, Matteo Pellegrini
{"title":"Identifying epigenetic aging moderators using the epigenetic pacemaker","authors":"Colin Farrell, Chanyue Hu, Kalsuda Lapborisuth, Kyle Pu, S. Snir, Matteo Pellegrini","doi":"10.3389/fbinf.2023.1308680","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1308680","url":null,"abstract":"Epigenetic clocks are DNA methylation-based chronological age prediction models that are commonly employed to study age-related biology. The difference between the predicted and observed age is often interpreted as a form of biological age acceleration, and many studies have measured the impact of environmental and disease-associated factors on epigenetic age. Most epigenetic clocks are fit using approaches that minimize the error between the predicted and observed chronological age, and as a result, they may not accurately model the impact of factors that moderate the relationship between the actual and epigenetic age. Here, we compare epigenetic clocks that are constructed using penalized regression methods to an evolutionary framework of epigenetic aging with the epigenetic pacemaker (EPM), which directly models DNA methylation as a function of a time-dependent epigenetic state. In simulations, we show that the value of the epigenetic state is impacted by factors such as age, sex, and cell-type composition. Next, in a dataset aggregated from previous studies, we show that the epigenetic state is also moderated by sex and the cell type. Finally, we demonstrate that the epigenetic state is also moderated by toxins in a study on polybrominated biphenyl exposure. Thus, we find that the pacemaker provides a robust framework for the study of factors that impact epigenetic age acceleration and that the effect of these factors may be obscured in traditional clocks based on linear regression models.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"47 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139451050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Varun Mannam, Jacob P. Brandt, Cody J. Smith, Xiaotong Yuan, S. Howard
{"title":"Improving fluorescence lifetime imaging microscopy phasor accuracy using convolutional neural networks","authors":"Varun Mannam, Jacob P. Brandt, Cody J. Smith, Xiaotong Yuan, S. Howard","doi":"10.3389/fbinf.2023.1335413","DOIUrl":"https://doi.org/10.3389/fbinf.2023.1335413","url":null,"abstract":"Introduction: Although a powerful biological imaging technique, fluorescence lifetime imaging microscopy (FLIM) faces challenges such as a slow acquisition rate, a low signal-to-noise ratio (SNR), and high cost and complexity. To address the fundamental problem of low SNR in FLIM images, we demonstrate how to use pre-trained convolutional neural networks (CNNs) to reduce noise in FLIM measurements.Methods: Our approach uses pre-learned models that have been previously validated on large datasets with different distributions than the training datasets, such as sample structures, noise distributions, and microscopy modalities in fluorescence microscopy, to eliminate the need to train a neural network from scratch or to acquire a large training dataset to denoise FLIM data. In addition, we are using the pre-trained networks in the inference stage, where the computation time is in milliseconds and accuracy is better than traditional denoising methods. To separate different fluorophores in lifetime images, the denoised images are then run through an unsupervised machine learning technique named “K-means clustering”.Results and Discussion: The results of the experiments carried out on in vivo mouse kidney tissue, Bovine pulmonary artery endothelial (BPAE) fixed cells that have been fluorescently labeled, and mouse kidney fixed samples that have been fluorescently labeled show that our demonstrated method can effectively remove noise from FLIM images and improve segmentation accuracy. Additionally, the performance of our method on out-of-distribution highly scattering in vivo plant samples shows that it can also improve SNR in challenging imaging conditions. Our proposed method provides a fast and accurate way to segment fluorescence lifetime images captured using any FLIM system. It is especially effective for separating fluorophores in noisy FLIM images, which is common in in vivo imaging where averaging is not applicable. Our approach significantly improves the identification of vital biologically relevant structures in biomedical imaging applications.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"4 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138944777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Attention network for predicting T-cell receptor-peptide binding can associate attention with interpretable protein structural properties.","authors":"Kyohei Koyama, Kosuke Hashimoto, Chioko Nagao, Kenji Mizuguchi","doi":"10.3389/fbinf.2023.1274599","DOIUrl":"10.3389/fbinf.2023.1274599","url":null,"abstract":"<p><p>Understanding how a T-cell receptor (TCR) recognizes its specific ligand peptide is crucial for gaining an insight into biological functions and disease mechanisms. Despite its importance, experimentally determining TCR-peptide-major histocompatibility complex (TCR-pMHC) interactions is expensive and time-consuming. To address this challenge, computational methods have been proposed, but they are typically evaluated by internal retrospective validation only, and few researchers have incorporated and tested an attention layer from language models into structural information. Therefore, in this study, we developed a machine learning model based on a modified version of Transformer, a source-target attention neural network, to predict the TCR-pMHC interaction solely from the amino acid sequences of the TCR complementarity-determining region (CDR) 3 and the peptide. This model achieved competitive performance on a benchmark dataset of the TCR-pMHC interaction, as well as on a truly new external dataset. Additionally, by analyzing the results of binding predictions, we associated the neural network weights with protein structural properties. By classifying the residues into large- and small-attention groups, we identified statistically significant properties associated with the largely attended residues such as hydrogen bonds within CDR3. The dataset that we created and the ability of our model to provide an interpretable prediction of TCR-peptide binding should increase our knowledge about molecular recognition and pave the way for designing new therapeutics.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"3 ","pages":"1274599"},"PeriodicalIF":0.0,"publicationDate":"2023-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10759225/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139089614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}