{"title":"Gene regulatory network inference based on modified adaptive lasso.","authors":"Chao Li, Xiaoran Huang, Xiao Luo, Xiaohui Lin","doi":"10.1142/S0219720024500264","DOIUrl":"https://doi.org/10.1142/S0219720024500264","url":null,"abstract":"<p><p>Gene regulatory networks (GRNs) reveal the regulatory interactions among genes and provide a visual tool to explain biological processes. However, how to identify direct relations among genes from gene expression data in the case of high-dimensional and small samples is a critical challenge. In this paper, we proposed a new GRN inference method based on a modified adaptive least absolute shrinkage and selection operator (MALasso). MALasso expands the number of samples based on the distance correlation and defines a new weighting manner for adaptive lasso to remove false positive edges of the networks in the iterative process. Simulated data and gene expression data from DREAM challenge were used to validate the performance of the proposed method MALasso. The comparison results among MALasso, adaptive lasso and other six state-of-the-art methods show that MALasso outperformed the competition methods in AUROCC and AUPRC in most cases and had a better ability to distinguish direct edges from indirect ones. Hence, by modifying the adaptive weighting manner of adaptive lasso, MALasso can detect linear and nonlinear relations, remove the false positive edges and identify direct relations among genes more accurately.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450026"},"PeriodicalIF":0.9,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143014473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dehua Chen, Yongsheng Yang, Dongdong Shi, Zhenhua Zhang, Mei Wang, Qiao Pan, Jianwen Su, Zhen Wang
{"title":"The use of 4D data-independent acquisition-based proteomic analysis and machine learning to reveal potential biomarkers for stress levels.","authors":"Dehua Chen, Yongsheng Yang, Dongdong Shi, Zhenhua Zhang, Mei Wang, Qiao Pan, Jianwen Su, Zhen Wang","doi":"10.1142/S0219720024500252","DOIUrl":"https://doi.org/10.1142/S0219720024500252","url":null,"abstract":"<p><p>Research suggests that individuals who experience prolonged exposure to stress may be at higher risk for developing psychological stress disorders. Currently, psychological stress is primarily evaluated by professional physicians using rating scales, which may be prone to subjective biases and limitations of the scales. Therefore, it is imperative to explore more objective, accurate, and efficient biomarkers for evaluating the level of psychological stress in an individual. In this study, we utilized 4D data-independent acquisition (4D-DIA) proteomics for quantitative protein analysis, and then employed support vector machine (SVM) combined with SHAP interpretation algorithm to identify potential biomarkers for psychological stress levels. Biomarkers validation was subsequently achieved through machine learning classification and a substantial amount of a priori knowledge derived from the knowledge graph. We performed cross-validation of the biomarkers using two batches of data, and the results showed that the combination of Glyceraldehyde-3-phosphate dehydrogenase and Fibronectin yielded an average area under the curve (AUC) of 92%, an average accuracy of 86%, an average F1 score of 79%, and an average sensitivity of 83%. Therefore, this combination may represent a potential approach for detecting stress levels to prevent psychological stress disorders.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450025"},"PeriodicalIF":0.9,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142639951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving drug-target interaction prediction through dual-modality fusion with InteractNet.","authors":"Baozhong Zhu, Runhua Zhang, Tengsheng Jiang, Zhiming Cui, Jing Chen, Hongjie Wu","doi":"10.1142/S0219720024500240","DOIUrl":"https://doi.org/10.1142/S0219720024500240","url":null,"abstract":"<p><p>In the drug discovery process, accurate prediction of drug-target interactions is crucial to accelerate the development of new drugs. However, existing methods still face many challenges in dealing with complex biomolecular interactions. To this end, we propose a new deep learning framework that combines the structural information and sequence features of proteins to provide comprehensive feature representation through bimodal fusion. This framework not only integrates the topological adaptive graph convolutional network and multi-head attention mechanism, but also introduces a self-masked attention mechanism to ensure that each protein binding site can focus on its own unique features and its interaction with the ligand. Experimental results on multiple public datasets show that our method significantly outperforms traditional machine learning and graph neural network methods in predictive performance. In addition, our method can effectively identify and explain key molecular interactions, providing new insights into understanding the complex relationship between drugs and targets.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450024"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Li, Boran Wang, Zengding Wu, Shiliang Ji, Shi Xu, Caiyi Fei
{"title":"SAKit: An all-in-one analysis pipeline for identifying novel proteins resulting from variant events at both large and small scales.","authors":"Yan Li, Boran Wang, Zengding Wu, Shiliang Ji, Shi Xu, Caiyi Fei","doi":"10.1142/S0219720024500227","DOIUrl":"https://doi.org/10.1142/S0219720024500227","url":null,"abstract":"<p><p><i>Background:</i> Genetic mutations that cause the inactivation or aberrant activation of essential proteins may trigger alterations or even dysfunctions in cellular signaling pathways, culminating in the development of precancerous lesions and cancer. Mutations and such dysfunctions can result in the generation of \"novel proteins\" that are not part of the conventional human proteome. Identification of these proteins carries a profound potential for unraveling promising drug targets and designing innovative therapeutic models. Despite the emergence of diverse tools for detecting DNA or RNA variants, facilitated by the widespread adoption of nucleotide sequencing technology, these methods primarily target point mutations and exhibit suboptimal performance in detecting large-scale and combinatorial mutations. Additionally, the outcomes of these tools are confined to the genome and transcriptome levels, and do not provide the corresponding protein information resulting from genetic alterations. <i>Results:</i> We present the development of Sequencing Analysis Kit (SAKit), a bioinformatics pipeline for hybrid sequencing analysis integrating long-read and short-read RNA sequencing data. Long reads are utilized for detecting large-scale variations such as gene fusions, exon skipping, intron retention, and aberrant expression in non-coding regions, owing to their excellent coverage capabilities. Short reads serve to validate these findings at breakpoints and splice junctions. Conversely, short reads are employed for identifying small-scale variations, including single nucleotide variants, deletions, and insertions, due to their superior sequencing depth, with long reads providing additional validation. SAKit is designed to perform analyses using inter-species configuration files comprising genome references and annotation data, making it applicable to both human and mouse studies. Furthermore, SAKit implements a hierarchical filtering approach to eliminate low-confidence variants and employs open reading frame (ORF) analysis to translate identified variants into protein sequences. <i>Conclusion:</i> SAKit is a robust and versatile bioinformatics tool designed for the comprehensive identification of both large-scale and small-scale variants from RNA-seq data, facilitating the discovery of novel proteins. This pipeline integrates analysis of long-read and short-read sequencing data, offering a powerful solution for researchers in genomics and transcriptomics. SAKit is freely accessible and open-source, available through GitHub (https://github.com/therarna/SAKit) and as a Docker image https://hub.docker.com/repository/docker/therarna). Implemented primarily within a Snakemake framework using Python, SAKit ensures reproducibility, scalability, and ease of use for the scientific community.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450022"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142688766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rahim Berahmand, Masoumeh Emadpour, Mokhtar Jalali Javaran, Kaveh Haji-Allahverdipoor, Ali Akbarabadi
{"title":"Molecular dynamics simulations of ribosome-binding sites in theophylline-responsive riboswitch associated with improving the gene expression regulation in chloroplasts.","authors":"Rahim Berahmand, Masoumeh Emadpour, Mokhtar Jalali Javaran, Kaveh Haji-Allahverdipoor, Ali Akbarabadi","doi":"10.1142/S0219720024500239","DOIUrl":"https://doi.org/10.1142/S0219720024500239","url":null,"abstract":"<p><p>The existence of an efficient inducible transgene expression system is a valuable tool in recombinant protein production. The synthetic theophylline-responsive riboswitch (theo.RS) can be replaced in the 5[Formula: see text] untranslated region of an mRNA and control the translation of downstream gene in chloroplasts in response to the binding with a ligand molecule, theophylline. One of the drawbacks associated with the efficiency of the theo.RS is the leak in the RS structure allowing undesired background translation when the switch is expected to be off. The purpose of this study was to detect the factors causing the leak of the theo.RS in the off mode, using molecular dynamics (MD) simulations the appropriate balancing of the simulation system, using the necessary commands, a 40[Formula: see text]ns simulation was conducted. Analysis of the solvent-accessible surface area for both ribosome-binding site (RBS) regions indicated that nucleotide 79 of the theo.RS, a guanine, had the highest surface exposure to ribosome access. These results were verified with the study of hydrogen bonding of RBS regions with the RNA structure. Therefore, redesigning the RBS regions and avoiding the unmasked nucleotide(s) in the structure may improve the tightness of theo.RS in off mode resulting in the efficient inhibition of translation.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 5","pages":"2450023"},"PeriodicalIF":0.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142688505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Construction of a multi-tissue compound-target interaction network of Qingfei Paidu decoction in COVID-19 treatment based on deep learning and transcriptomic analysis.","authors":"Xia Li, Xuetong Zhao, Xinjian Yu, Jianping Zhao, Xiangdong Fang","doi":"10.1142/S0219720024500161","DOIUrl":"10.1142/S0219720024500161","url":null,"abstract":"<p><p>The Qingfei Paidu decoction (QFPDD) is a widely acclaimed therapeutic formula employed nationwide for the clinical management of coronavirus disease 2019 (COVID-19). QFPDD exerts a synergistic therapeutic effect, characterized by its multi-component, multi-target, and multi-pathway action. However, the intricate interactions among the ingredients and targets within QFPDD and their systematic effects in multiple tissues remain undetermined. To address this, we qualitatively characterized the chemical components of QFPDD. We integrated multi-tissue transcriptomic analysis with GraphDTA, a deep learning model, to screen for potential compound-target interactions of QFPDD in multiple tissues. We predicted 13 key active compounds, 127 potential targets and 27 pathways associated with QFPDD across six different tissues. Notably, oleanolic acid-AXL exhibited leading affinity in the heart, blood, and liver. Molecular docking and molecular dynamics simulation confirmed their strong binding affinity. The robust interaction between oleanolic acid and the AXL receptor suggests that AXL is a promising target for developing clinical intervention strategies. Through the construction of a multi-tissue compound-target interaction network, our study further elucidated the mechanisms through which QFPDD effectively combats COVID-19 in multiple tissues. Our work also establishes a framework for future investigations into the systemic effects of other Traditional Chinese Medicine (TCM) formulas in disease treatment.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450016"},"PeriodicalIF":0.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141735373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Min Li, Zhifang Qi, Liang Liu, Mingzhu Lou, Shaobo Deng
{"title":"PCA-constrained multi-core matrix fusion network: A novel approach for cancer subtype identification.","authors":"Min Li, Zhifang Qi, Liang Liu, Mingzhu Lou, Shaobo Deng","doi":"10.1142/S0219720024500148","DOIUrl":"10.1142/S0219720024500148","url":null,"abstract":"<p><p>Cancer subtyping refers to categorizing a particular cancer type into distinct subtypes or subgroups based on a range of molecular characteristics, clinical manifestations, histological features, and other relevant factors. The identification of cancer subtypes can significantly enhance precision in clinical practice and facilitate personalized diagnosis and treatment strategies. Recent advancements in the field have witnessed the emergence of numerous network fusion methods aimed at identifying cancer subtypes. The majority of these fusion algorithms, however, solely rely on the fusion network of a single core matrix for the identification of cancer subtypes and fail to comprehensively capture similarity. To tackle this issue, in this study, we propose a novel cancer subtype recognition method, referred to as PCA-constrained multi-core matrix fusion network (PCA-MM-FN). The PCA-MM-FN algorithm initially employs three distinct methods to obtain three core matrices. Subsequently, the obtained core matrices are projected into a shared subspace using principal component analysis, followed by a weighted network fusion. Lastly, spectral clustering is conducted on the fused network. The results obtained from conducting experiments on the mRNA expression, DNA methylation, and miRNA expression of five TCGA datasets and three multi-omics benchmark datasets demonstrate that the proposed PCA-MM-FN approach exhibits superior accuracy in identifying cancer subtypes compared to the existing methods.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450014"},"PeriodicalIF":0.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142057039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V Abinas, U Abhinav, E M Haneem, A Vishnusankar, K A Abdul Nazeer
{"title":"Integration of autoencoder and graph convolutional network for predicting breast cancer drug response.","authors":"V Abinas, U Abhinav, E M Haneem, A Vishnusankar, K A Abdul Nazeer","doi":"10.1142/S0219720024500136","DOIUrl":"https://doi.org/10.1142/S0219720024500136","url":null,"abstract":"<p><p><b>Background and objectives:</b> Breast cancer is the most prevalent type of cancer among women. The effectiveness of anticancer pharmacological therapy may get adversely affected by tumor heterogeneity that includes genetic and transcriptomic features. This leads to clinical variability in patient response to therapeutic drugs. Anticancer drug design and cancer understanding require precise identification of cancer drug responses. The performance of drug response prediction models can be improved by integrating multi-omics data and drug structure data. <b>Methods:</b> In this paper, we propose an Autoencoder (AE) and Graph Convolutional Network (AGCN) for drug response prediction, which integrates multi-omics data and drug structure data. Specifically, we first converted the high dimensional representation of each omic data to a lower dimensional representation using an AE for each omic data set. Subsequently, these individual features are combined with drug structure data obtained using a Graph Convolutional Network and given to a Convolutional Neural Network to calculate IC[Formula: see text] values for every combination of cell lines and drugs. Then a threshold IC[Formula: see text] value is obtained for each drug by performing K-means clustering of their known IC[Formula: see text] values. Finally, with the help of this threshold value, cell lines are classified as either sensitive or resistant to each drug. <b>Results:</b> Experimental results indicate that AGCN has an accuracy of 0.82 and performs better than many existing methods. In addition to that, we have done external validation of AGCN using data taken from The Cancer Genome Atlas (TCGA) clinical database, and we got an accuracy of 0.91. <b>Conclusion:</b> According to the results obtained, concatenating multi-omics data with drug structure data using AGCN for drug response prediction tasks greatly improves the accuracy of the prediction task.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 3","pages":"2450013"},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141761970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gtie-Rt: A comprehensive graph learning model for predicting drugs targeting metabolic pathways in human.","authors":"Hayat Ali Shah, Juan Liu, Zhihui Yang","doi":"10.1142/S0219720024500100","DOIUrl":"10.1142/S0219720024500100","url":null,"abstract":"<p><p>Drugs often target specific metabolic pathways to produce a therapeutic effect. However, these pathways are complex and interconnected, making it challenging to predict a drug's potential effects on an organism's overall metabolism. The mapping of drugs with targeting metabolic pathways in the organisms can provide a more complete understanding of the metabolic effects of a drug and help to identify potential drug-drug interactions. In this study, we proposed a machine learning hybrid model Graph Transformer Integrated Encoder (GTIE-RT) for mapping drugs to target metabolic pathways in human. The proposed model is a composite of a Graph Convolution Network (GCN) and transformer encoder for graph embedding and attention mechanism. The output of the transformer encoder is then fed into the Extremely Randomized Trees Classifier to predict target metabolic pathways. The evaluation of the GTIE-RT on drugs dataset demonstrates excellent performance metrics, including accuracy (>95%), recall (>92%), precision (>93%) and F1-score (>92%). Compared to other variants and machine learning methods, GTIE-RT consistently shows more reliable results.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":" ","pages":"2450010"},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141727966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Construction of transcript regulation mechanism prediction models based on binding motif environment of transcription factor AoXlnR in <i>Aspergillus oryzae</i>.","authors":"Hiroya Oka, Takaaki Kojima, Ryuji Kato, Kunio Ihara, Hideo Nakano","doi":"10.1142/S0219720024500173","DOIUrl":"10.1142/S0219720024500173","url":null,"abstract":"<p><p>DNA-binding transcription factors (TFs) play a central role in transcriptional regulation mechanisms, mainly through their specific binding to target sites on the genome and regulation of the expression of downstream genes. Therefore, a comprehensive analysis of the function of these TFs will lead to the understanding of various biological mechanisms. However, the functions of TFs <i>in vivo</i> are diverse and complicated, and the identified binding sites on the genome are not necessarily involved in the regulation of downstream gene expression. In this study, we investigated whether DNA structural information around the binding site of TFs can be used to predict the involvement of the binding site in the regulation of the expression of genes located downstream of the binding site. Specifically, we calculated the structural parameters based on the DNA shape around the DNA binding motif located upstream of the gene whose expression is directly regulated by one TF AoXlnR from <i>Aspergillus oryzae</i>, and showed that the presence or absence of expression regulation can be predicted from the sequence information with high accuracy ([Formula: see text]-1.0) by machine learning incorporating these parameters.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"22 3","pages":"2450017"},"PeriodicalIF":0.9,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141761969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}