{"title":"Testing and overcoming the limitations of modular response analysis.","authors":"Jean-Pierre Borg, Jacques Colinge, Patrice Ravel","doi":"10.1093/bib/bbaf098","DOIUrl":"10.1093/bib/bbaf098","url":null,"abstract":"<p><p>Modular response analysis (MRA) is an effective method to infer biological networks from perturbation data. However, it has several limitations such as strong sensitivity to noise, need of performing independent perturbations that hit a single node at a time, and linear approximation of dependencies within the network. Previously, we addressed the sensitivity of MRA to noise by reinterpreting MRA as a multilinear regression problem. We demonstrated the advantages of this approach over the conventional MRA and other known inference methods, particularly in handling noise measurements and nonlinear networks. Here, we provide new contributions to complement this theory. First, we overcome the need of perturbations to be independent, thereby augmenting MRA applicability. Second, using analysis of variance and lack-of-fit tests, we can now assess MRA compatibility with the data and identify the primary source of errors. In cases where nonlinearity prevails, we propose extending the model to a second-order polynomial. Third, we demonstrate how to effectively use prior knowledge about a network. We validated these results using 4 networks with known dynamics (3, 4, and 6 nodes) and 40 simulated networks, ranging from 10 to 200 nodes. Finally, we incorporated these innovations into our R software package MRARegress to offer a comprehensive, extended theory for MRA and to facilitate its use by the community. Mathematical aspects, tests details, and scripts are provided as Supplementary Information (see 'Data Availability Statement').</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11891662/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143584585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Xu, Gang Luo, Weiyu Meng, Xiaobing Zhai, Keli Zheng, Ji Wu, Yanrong Li, Abao Xing, Junrong Li, Zhifan Li, Ke Zheng, Kefeng Li
{"title":"MRAgent: an LLM-based automated agent for causal knowledge discovery in disease via Mendelian randomization.","authors":"Wei Xu, Gang Luo, Weiyu Meng, Xiaobing Zhai, Keli Zheng, Ji Wu, Yanrong Li, Abao Xing, Junrong Li, Zhifan Li, Ke Zheng, Kefeng Li","doi":"10.1093/bib/bbaf140","DOIUrl":"10.1093/bib/bbaf140","url":null,"abstract":"<p><p>Understanding causality in medical research is essential for developing effective interventions and diagnostic tools. Mendelian Randomization (MR) is a pivotal method for inferring causality through genetic data. However, MR analysis often requires pre-identification of exposure-outcome pairs from clinical experience or literature, which can be challenging to obtain. This poses difficulties for clinicians investigating causal factors of specific diseases. To address this, we introduce MRAgent, an innovative automated agent leveraging Large Language Models (LLMs) to enhance causal knowledge discovery in disease research. MRAgent autonomously scans scientific literature, discovers potential exposure-outcome pairs, and performs MR causal inference using extensive Genome-Wide Association Study data. We conducted both automated and human evaluations to compare different LLMs in operating MRAgent and provided a proof-of-concept case to demonstrate the complete workflow. MRAgent's capability to conduct large-scale causal analyses represents a significant advancement, equipping researchers and clinicians with a robust tool for exploring and validating causal relationships in complex diseases. Our code is public at https://github.com/xuwei1997/MRAgent.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11975362/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143802516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenrui Duan, Zelin Zang, Yongjie Xu, Hang He, Siyuan Li, Zihan Liu, Zhen Lei, Ju-Sheng Zheng, Stan Z Li
{"title":"FGeneBERT: function-driven pre-trained gene language model for metagenomics.","authors":"Chenrui Duan, Zelin Zang, Yongjie Xu, Hang He, Siyuan Li, Zihan Liu, Zhen Lei, Ju-Sheng Zheng, Stan Z Li","doi":"10.1093/bib/bbaf149","DOIUrl":"https://doi.org/10.1093/bib/bbaf149","url":null,"abstract":"<p><p>Metagenomic data, comprising mixed multi-species genomes, are prevalent in diverse environments like oceans and soils, significantly impacting human health and ecological functions. However, current research relies on K-mer, which limits the capture of structurally and functionally relevant gene contexts. Moreover, these approaches struggle with encoding biologically meaningful genes and fail to address the one-to-many and many-to-one relationships inherent in metagenomic data. To overcome these challenges, we introduce FGeneBERT, a novel metagenomic pre-trained model that employs a protein-based gene representation as a context-aware and structure-relevant tokenizer. FGeneBERT incorporates masked gene modeling to enhance the understanding of inter-gene contextual relationships and triplet enhanced metagenomic contrastive learning to elucidate gene sequence-function relationships. Pre-trained on over 100 million metagenomic sequences, FGeneBERT demonstrates superior performance on metagenomic datasets at four levels, spanning gene, functional, bacterial, and environmental levels and ranging from 1 to 213 k input sequences. Case studies of ATP synthase and gene operons highlight FGeneBERT's capability for functional recognition and its biological relevance in metagenomic research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986344/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143974992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guo-Hua Yuan, Jinzhe Li, Zejun Yang, Yao-Qi Chen, Zhonghang Yuan, Tao Chen, Wanli Ouyang, Nanqing Dong, Li Yang
{"title":"Deep generative model for protein subcellular localization prediction.","authors":"Guo-Hua Yuan, Jinzhe Li, Zejun Yang, Yao-Qi Chen, Zhonghang Yuan, Tao Chen, Wanli Ouyang, Nanqing Dong, Li Yang","doi":"10.1093/bib/bbaf152","DOIUrl":"https://doi.org/10.1093/bib/bbaf152","url":null,"abstract":"<p><p>Protein sequence not only determines its structure but also provides important clues of its subcellular localization. Although a series of artificial intelligence models have been reported to predict protein subcellular localization, most of them provide only textual outputs. Here, we present deepGPS, a deep generative model for protein subcellular localization prediction. After training with protein primary sequences and fluorescence images, deepGPS shows the ability to predict cytoplasmic and nuclear localizations by reporting both textual labels and generative images as outputs. In addition, cell-type-specific deepGPS models can be developed by using distinct image datasets from different cell lines for comparative analyses. Moreover, deepGPS shows potential to be further extended for other specific organelles, such as vesicles and endoplasmic reticulum, even with limited volumes of training data. Finally, the openGPS website (https://bits.fudan.edu.cn/opengps) is constructed to provide a publicly accessible and user-friendly platform for studying protein subcellular localization and function.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986326/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143975739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MIRACN: a residual convolutional neural network for predicting cell line specific functional regulatory variants.","authors":"Zeyin Li, Min Wang, Songge Li, Fangyuan Shi","doi":"10.1093/bib/bbaf196","DOIUrl":"https://doi.org/10.1093/bib/bbaf196","url":null,"abstract":"<p><p>In post-genome-wide association study era, interpretation of noncoding variants remains a significant challenge due to their complexity and the limited understanding of their functions. Here, we developed MIRACN, a novel residual convolutional neural network designed to predict cell line-specific functional regulatory variants. By utilizing a substantial dataset from massively parallel reporter assays (MPRAs) and employing a multitask learning strategy, MIRACN was trained across seven distinct cell lines, attaining superior performance compared to existing methods, especially in predicting cell type specificity. Comparative evaluations on an independent MPRA test dataset demonstrated that MIRACN not only outperformed in identifying regulatory variants but also provided valuable insights into their cellular context-specific regulatory mechanisms. MIRACN is capable of not only providing scores for functional variants but also pinpointing the specific cell line in which these variants display their function. This enhancement has improved the resolution of current research on the functionality of noncoding variants and has paved the way for more precise diagnostic and therapeutic strategies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12021264/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143976948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PathSynergy: a deep learning model for predicting drug synergy in liver cancer.","authors":"Fengyue Zhang, Xuqi Zhao, Jinrui Wei, Lichuan Wu","doi":"10.1093/bib/bbaf192","DOIUrl":"https://doi.org/10.1093/bib/bbaf192","url":null,"abstract":"<p><p>Cancer is a major public health problem while liver cancer is the main cause of global cancer-related deaths. The previous study demonstrates that the 5-year survival rate for advanced liver cancer is only 30%. Few of the first-line targeted drugs including sorafenib and lenvatinib are available, which often develop resistance. Drug combination therapy is crucial for improving the efficacy of cancer therapy and overcoming resistance. However, traditional methods for discovering drug synergy are costly and time consuming. In this study, we developed a novel predicting model PathSynergy by integrating drug feature data, cell line data, drug-target interactions, and signaling pathways. PathSynergy combined the advantages of graph neural networks and pathway map mapping. Comparing with other baseline models, PathSynergy showed better performance in model classification, accuracy, and precision. Excitingly, six Food and Drug Administration (FDA)-approved drugs including pimecrolimus, topiramate, nandrolone_decanoate, fluticasone propionate, zanubrutinib, and levonorgestrel were predicted and validated to show synergistic effects with sorafenib or lenvatinib against liver cancer for the first time. In general, the PathSynergy model provides a new perspective to discover synergistic combinations of drugs and has broad application potential in the fields of drug discovery and personalized medicine.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12021016/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143972948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"sTPLS: identifying common and specific correlated patterns under multiple biological conditions.","authors":"Jinyu Chen, Wenwen Min","doi":"10.1093/bib/bbaf195","DOIUrl":"https://doi.org/10.1093/bib/bbaf195","url":null,"abstract":"<p><p>The rapidly emerging large-scale data in diverse biological research fields present valuable opportunities to explore the underlying mechanisms of tissue development and disease progression. However, few existing methods can simultaneously capture common and condition-specific association between different types of features across different biological conditions, such as cancer types or cell populations. Therefore, we developed the sparse tensor-based partial least squares (sTPLS) method, which integrates multiple pairs of datasets containing two types of features but derived from different biological conditions. We demonstrated the effectiveness and versatility of sTPLS through simulation study and three biological applications. By integrating the pairwise pharmacogenomic data, sTPLS identified 11 gene-drug comodules with high biological functional relevance specific for seven cancer types and two comodules that shared across multi-type cancers, such as breast, ovarian, and colorectal cancers. When applied to single-cell data, it uncovered nine gene-peak comodules representing transcriptional regulatory relationships specific for five cell types and three comodules shared across similar cell types, such as intermediate and naïve B cells. Furthermore, sTPLS can be directly applied to tensor-structured data, successfully revealing shared and distinct cell communication patterns mediated by the MK signaling pathway in coronavirus disease 2019 patients and healthy controls. These results highlight the effectiveness of sTPLS in identifying biologically meaningful relationships across diverse conditions, making it useful for multi-omics integrative analysis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12031727/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143959543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aleix Boquet-Pujadas, Jian Zeng, Ye Ella Tian, Zhijian Yang, Li Shen, Andrew Zalesky, Christos Davatzikos, Junhao Wen
{"title":"MUTATE: a human genetic atlas of multiorgan artificial intelligence endophenotypes using genome-wide association summary statistics.","authors":"Aleix Boquet-Pujadas, Jian Zeng, Ye Ella Tian, Zhijian Yang, Li Shen, Andrew Zalesky, Christos Davatzikos, Junhao Wen","doi":"10.1093/bib/bbaf125","DOIUrl":"10.1093/bib/bbaf125","url":null,"abstract":"<p><p>Artificial intelligence (AI) has been increasingly integrated into imaging genetics to provide intermediate phenotypes (i.e. endophenotypes) that bridge the genetics and clinical manifestations of human disease. However, the genetic architecture of these AI endophenotypes remains largely unexplored in the context of human multiorgan system diseases. Using publicly available genome-wide association study summary statistics from the UK Biobank (UKBB), FinnGen, and the Psychiatric Genomics Consortium, we comprehensively depicted the genetic architecture of 2024 multiorgan AI endophenotypes (MAEs). We comparatively assessed the single-nucleotide polymorphism-based heritability, polygenicity, and natural selection signatures of 2024 MAEs using methods commonly used in the field. Genetic correlation and Mendelian randomization analyses reveal both within-organ relationships and cross-organ interconnections. Bi-directional causal relationships were established between chronic human diseases and MAEs across multiple organ systems, including Alzheimer's disease for the brain, diabetes for the metabolic system, asthma for the pulmonary system, and hypertension for the cardiovascular system. Finally, we derived polygenic risk scores for the 2024 MAEs for individuals not used to calculate MAEs and returned these to the UKBB. Our findings underscore the promise of the MAEs as new instruments to ameliorate overall human health. All results are encapsulated into the MUlTiorgan AI endophenoTypE genetic atlas and are publicly available at https://labs-laboratory.com/mutate.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11938998/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143708594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DOMSCNet: a deep learning model for the classification of stomach cancer using multi-layer omics data.","authors":"Kasmika Borah, Himanish Shekhar Das, Ram Kaji Budhathoki, Khursheed Aurangzeb, Saurav Mallik","doi":"10.1093/bib/bbaf115","DOIUrl":"10.1093/bib/bbaf115","url":null,"abstract":"<p><p>The rapid advancement of next-generation sequencing (NGS) technology and the expanding availability of NGS datasets have led to a significant surge in biomedical research. To better understand the molecular processes, underlying cancer and to support its development, diagnosis, prediction, and therapy; NGS data analysis is crucial. However, the NGS multi-layer omics high-dimensional dataset is highly complex. In recent times, some computational methods have been developed for cancer omics data interpretation. However, various existing methods face challenges in accounting for diverse types of cancer omics data and struggle to effectively extract informative features for the integrated identification of core units. To address these challenges, we proposed a hybrid feature selection (HFS) technique to detect optimal features from multi-layer omics datasets. Subsequently, this study proposes a novel hybrid deep recurrent neural network-based model DOMSCNet to classify stomach cancer. The proposed model was made generic for all four multi-layer omics datasets. To observe the robustness of the DOMSCNet model, the proposed model was validated with eight external datasets. Experimental results showed that the SelectKBest-maximum relevancy minimum redundancy-Boruta (SMB), HFS technique outperformed all other HFS techniques. Across four multi-layer omics datasets and validated datasets, the proposed DOMSCNet model outdid existing classifiers along with other proposed classifiers.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11966610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143771445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jair Herazo-Álvarez, Marco Mora, Sara Cuadros-Orellana, Karina Vilches-Ponce, Ruber Hernández-García
{"title":"A review of neural networks for metagenomic binning.","authors":"Jair Herazo-Álvarez, Marco Mora, Sara Cuadros-Orellana, Karina Vilches-Ponce, Ruber Hernández-García","doi":"10.1093/bib/bbaf065","DOIUrl":"10.1093/bib/bbaf065","url":null,"abstract":"<p><p>One of the main goals of metagenomic studies is to describe the taxonomic diversity of microbial communities. A crucial step in metagenomic analysis is metagenomic binning, which involves the (supervised) classification or (unsupervised) clustering of metagenomic sequences. Various machine learning models have been applied to address this task. In this review, the contributions of artificial neural networks (ANN) in the context of metagenomic binning are detailed, addressing both supervised, unsupervised, and semi-supervised approaches. 34 ANN-based binning tools are systematically compared, detailing their architectures, input features, datasets, advantages, disadvantages, and other relevant aspects. The findings reveal that deep learning approaches, such as convolutional neural networks and autoencoders, achieve higher accuracy and scalability than traditional methods. Gaps in benchmarking practices are highlighted, and future directions are proposed, including standardized datasets and optimization of architectures, for third-generation sequencing. This review provides support to researchers in identifying trends and selecting suitable tools for the metagenomic binning problem.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 2","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11934572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143699297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}