Ning Wang, Minghui Wu, Wenchao Gu, Chenglong Dai, Zongru Shao, K P Subbalakshmi
{"title":"MSFT-transformer: a multistage fusion tabular transformer for disease prediction using metagenomic data.","authors":"Ning Wang, Minghui Wu, Wenchao Gu, Chenglong Dai, Zongru Shao, K P Subbalakshmi","doi":"10.1093/bib/bbaf217","DOIUrl":"10.1093/bib/bbaf217","url":null,"abstract":"<p><p>More and more recent studies highlight the crucial role of the human microbiome in maintaining health, while modern advancements in metagenomic sequencing technologies have been accumulating data that are associated with human diseases. Although metagenomic data offer rich, multifaceted information, including taxonomic and functional abundance profiles, their full potential remains underutilized, as most approaches rely only on one type of information to discover and understand their related correlations with respect to disease occurrences. To address this limitation, we propose a multistage fusion tabular transformer architecture (MSFT-Transformer), aiming to effectively integrate various types of high-dimensional tabular information extracted from metagenomic data. Its multistage fusion strategy consists of three modules: a fusion-aware feature extraction module in the early stage to improve the extracted information from inputs, an alignment-enhanced fusion module in the mid stage to enforce the retainment of desired information in cross-modal learning, and an integrated feature decision layer in the late stage to incorporate desired cross-modal information. We conduct extensive experiments to evaluate the performance of MSFT-Transformer over state-of-the-art models on five standard datasets. Our results indicate that MSFT-Transformer provides stable performance gains with reduced computational costs. An ablation study illustrates the contributions of all three models compared with a reference multistage fusion transformer without these novel strategies. The result analysis implies the significant potential of the proposed model in future disease prediction with metagenomic data.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12078939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144075906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An overview of computational methods in single-cell transcriptomic cell type annotation.","authors":"Tianhao Li, Zixuan Wang, Yuhang Liu, Sihan He, Quan Zou, Yongqing Zhang","doi":"10.1093/bib/bbaf207","DOIUrl":"10.1093/bib/bbaf207","url":null,"abstract":"<p><p>The rapid accumulation of single-cell RNA sequencing data has provided unprecedented computational resources for cell type annotation, significantly advancing our understanding of cellular heterogeneity. Leveraging gene expression profiles derived from transcriptomic data, researchers can accurately infer cell types, sparking the development of numerous innovative annotation methods. These methods utilize a range of strategies, including marker genes, correlation-based matching, and supervised learning, to classify cell types. In this review, we systematically examine these annotation approaches based on transcriptomics-specific gene expression profiles and provide a comprehensive comparison and categorization of these methods. Furthermore, we focus on the main challenges in the annotation process, especially the long-tail distribution problem arising from data imbalance in rare cell types. We discuss the potential of deep learning techniques to address these issues and enhance model capability in recognizing novel cell types within an open-world framework.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12065632/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143976188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mofan Feng, Liangjie Liu, Zhuo-Ning Xian, Xiaoxi Wei, Keyi Li, Wenqian Yan, Qing Lu, Yi Shi, Guang He
{"title":"PSTP: accurate residue-level phase separation prediction using protein conformational and language model embeddings.","authors":"Mofan Feng, Liangjie Liu, Zhuo-Ning Xian, Xiaoxi Wei, Keyi Li, Wenqian Yan, Qing Lu, Yi Shi, Guang He","doi":"10.1093/bib/bbaf171","DOIUrl":"10.1093/bib/bbaf171","url":null,"abstract":"<p><p>Phase separation (PS) is essential in cellular processes and disease mechanisms, highlighting the need for predictive algorithms to analyze uncharacterized sequences and accelerate experimental validation. Current high-accuracy methods often rely on extensive annotations or handcrafted features, limiting their generalizability to sequences lacking such annotations and making it difficult to identify key protein regions involved in PS. We introduce Phase Separation's Transfer-learning Prediction (PSTP), which combines conformational embeddings with large language model embeddings, enabling state-of-the-art PS predictions from protein sequences alone. PSTP performs well across various prediction scenarios and shows potential for predicting novel-designed artificial proteins. Additionally, PSTP provides residue-level predictions that are highly correlated with experimentally validated PS regions. By analyzing 160 000+ variants, PSTP characterizes the strong link between the incidence of pathogenic variants and residue-level PS propensities in unconserved intrinsically disordered regions, offering insights into underexplored mutation effects. PSTP's sliding-window optimization reduces its memory usage to a few hundred megabytes, facilitating rapid execution on typical CPUs and GPUs. Offered via both a web server and an installable Python package, PSTP provides a versatile tool for decoding protein PS behavior and supporting disease-focused research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12047702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143964687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingchen Zhai, Xiguang Qi, Lianjin Cai, Yue Liu, Haocheng Tang, Lei Xie, Junmei Wang
{"title":"NNKcat: deep neural network to predict catalytic constants (Kcat) by integrating protein sequence and substrate structure with enhanced data imbalance handling.","authors":"Jingchen Zhai, Xiguang Qi, Lianjin Cai, Yue Liu, Haocheng Tang, Lei Xie, Junmei Wang","doi":"10.1093/bib/bbaf212","DOIUrl":"10.1093/bib/bbaf212","url":null,"abstract":"<p><p>Catalytic constant (Kcat) is to describe the efficiency of catalyzing reactions. The Kcat value of an enzyme-substrate pair indicates the rate an enzyme converts saturated substrates into product during the catalytic process. However, it is challenging to construct robust prediction models for this important property. Most of the existing models, including the one recently published by Nature Catalysis (Li et al.), are suffering from the overfitting issue. In this study, we proposed a novel protocol to construct Kcat prediction models, introducing an intermedia step to separately develop substrate and protein processors. The substrate processor leverages analyzing Simplified Molecular Input Line Entry System (SMILES) strings using a graph neural network model, attentive FP, while the protein processor abstracts protein sequence information utilizing long short-term memory architecture. This protocol not only mitigates the impact of data imbalance in the original dataset but also provides greater flexibility in customizing the general-purpose Kcat prediction model to enhance the prediction accuracy for specific enzyme classes. Our general-purpose Kcat prediction model demonstrates significantly enhanced stability and slightly better accuracy (R2 value of 0.54 versus 0.50) in comparison with Li et al.'s model using the same dataset. Additionally, our modeling protocol enables personalization of fine-tuning the general-purpose Kcat model for specific enzyme categories through focused learning. Using Cytochrome P450 (CYP450) enzymes as a case study, we achieved the best R2 value of 0.64 for the focused model. The high-quality performance and expandability of the model guarantee its broad applications in enzyme engineering and drug research & development.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12078937/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144075884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"StereoMM: a graph fusion model for integrating spatial transcriptomic data and pathological images.","authors":"Bingying Luo, Fei Teng, Guo Tang, Weixuan Cen, Xing Liu, Jinmiao Chen, Chi Qu, Xuanzhu Liu, Xin Liu, Wenyan Jiang, Huaqiang Huang, Yu Feng, Xue Zhang, Min Jian, Mei Li, Feng Xi, Guibo Li, Sha Liao, Ao Chen, Weimiao Yu, Xun Xu, Jiajun Zhang","doi":"10.1093/bib/bbaf210","DOIUrl":"10.1093/bib/bbaf210","url":null,"abstract":"<p><p>Spatial omics technologies, generating high-throughput and multimodal data, have necessitated the development of advanced data integration methods to facilitate comprehensive biological and clinical treatment discoveries. Based on the cross-attention concept, we developed an AI learning based toolchain called StereoMM, a graph based fusion model that can incorporate omics data such as gene expression, histological images, and spatial location. StereoMM uses an attention module for omics data interaction and a graph autoencoder to integrate spatial positions and omics data in a self-supervised manner. Applying StereoMM across various cancer types and platforms has demonstrated its robust capability. StereoMM outperforms competitors in identifying spatial regions reflecting tumour progression and shows promise in classifying colorectal cancer patients into deficient mismatch repair and proficient mismatch repair groups. The comprehensive inter-modal integration and efficiency of StereoMM enable researchers to construct spatial views of integrated multimodal features efficiently, advancing thorough tissue and patient characterization.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12100622/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metabolism-associated protein network constructing and host-directed anti-influenza drug repurposing.","authors":"Hao Tang, Feng Jiang, Zhi Zhang, Jiaojiao Yang, Lu Li, Qingye Zhang","doi":"10.1093/bib/bbaf163","DOIUrl":"10.1093/bib/bbaf163","url":null,"abstract":"<p><p>Host-directed antivirals offer a promising strategy for addressing the challenge of viral resistance. Virus-host interactions often trigger stage-specific metabolic reprogramming in the host, and the causal links between these interactions and virus-induced metabolic changes provide valuable insights for identifying host targets. In this study, we present a workflow for repurposing host-directed antivirals using virus-induced protein networks. These networks capture the dynamic progression of viral infection by integrating host proteins directly interacting with the virus and enzymes associated with significantly altered metabolic fluxes, identified through dual-species genome-scale metabolic models. This approach reveals numerous hub nodes as potential host targets. As a case study, 50 approved drugs with potential anti-influenza virus A (IVA) activity were identified through eight stage-specific IVA-induced protein networks, each comprising 699-899 hub nodes. Lisinopril, saxagliptin, and gliclazide were further validated for anti-IVA efficacy in vitro through assays measuring the inhibition of cytopathic effects and viral titers in A549 cells infected with IVA PR8. This workflow paves the way for the rapid repurposing of host-directed antivirals.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12048005/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143963033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel prognostic framework for HBV-infected hepatocellular carcinoma: insights from ferroptosis and iron metabolism proteomics.","authors":"Zhiwei Cheng, Yongyong Ren, Xinbo Wang, Yuening Zhang, Yingqi Hua, Hongyu Zhao, Hui Lu","doi":"10.1093/bib/bbaf216","DOIUrl":"10.1093/bib/bbaf216","url":null,"abstract":"<p><p>Effective classification methods and prognostic models enable more accurate classification and treatment of hepatocellular carcinoma (HCC) patients. However, the weak correlation between RNA and protein data has limited the clinical utility of previous RNA-based prognostic models for HCC. In this work, we constructed a novel prognostic framework for HCC patients using seven differentially expressed proteins associated with ferroptosis and iron metabolism. Furthermore, this prognostic model robustly classifies HCC patients into three clinically relevant risk groups. Significant differences in overall survival, age, tumor differentiation, microvascular invasion, distant metastasis, and alpha-fetoprotein levels were observed among the risk groups. Based on the prognostic model and known biological pathways, we explored the potential mechanisms underlying the inconsistent differential expression patterns of FTH1 (Ferritin heavy chain 1) mRNA and protein. Our findings demonstrated that tumor tissues in HCC patients promote liver cancer progression by downregulating FTH1 protein expression, rather than upregulating FTH1 mRNA expression, ultimately leading to poor prognosis. Subsequently, based on risk score and tumor size, we developed a nomogram for predicting the prognosis of HCC patients, which demonstrated superior predictive performance in both the training and validation cohorts (C-index: 0.774; AUC for 1-5 years: 0.783-0.964). Additionally, our findings demonstrated that the adverse prognosis of high-risk HCC patients was closely correlated with ferroptosis in liver cancer tissues, alterations in iron metabolism, and changes in the tumor immune microenvironment. In conclusion, our prognostic model and predictive nomogram offer novel insights and tools for the effective classification of HCC patients, potentially enhancing clinical decision-making and outcomes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12085197/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144092840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view clustering for single-cell RNA-seq data based on graph fusion.","authors":"Jing Wang, Junfeng Xia, Dayu Tan, Yunjie Ma, Yansen Su, Chun-Hou Zheng","doi":"10.1093/bib/bbaf193","DOIUrl":"10.1093/bib/bbaf193","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) provides transcriptome profiling of individual cells, allowing for in-depth studies of cell heterogeneity at cell resolution. While cell clustering lays the basic foundation of scRNA-seq data analysis, the high-dimensionality and frequent dropout events of the data raise great challenges. Although plenty of dedicated clustering methods have been proposed, they often fail to fully explore the underlying data structure. Here, we introduce scMCGF, a new multi-view clustering algorithm based on graph fusion. It utilizes multi-view data generated from transcriptomic data to learn the consistent and complementary information across different view, ultimately constructing a unified graph matrix for robust cell clustering. Specifically, scMCGF utilizes two-dimensional-reduction methods (principal component analysis and diffusion maps) to capture both linear and non-linear characteristics of the data. Additionally, it calculates a cell-pathway score matrix to incorporate pathway-level information. These three features, along with the pre-processed gene expression data, form the multi-view data. scMCGF iteratively refines the structure of similarity graphs of each view through adaptive learning and learns a unified graph matrix by weighting and fusing the individual similarity graph matrix. The final clustering results are obtained by applying the rank constraint on the Laplacian matrix of the unified graph matrix. Experiments results of 13 real data sets reveal that scMCGF outperforms eight state-of-the-art methods in clustering accuracy and robustness. Furthermore, biological analysis validates that the clustering results of scMCGF provide a reliable foundation for downstream investigations.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12103903/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144141452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kaizhuang Jing, Tingchu Wei, Xuedie Gu, Guoliang Lin, Lin Liu, Jing Luo
{"title":"Deep learning reveals determinants of transcriptional infidelity at nucleotide resolution in the allopolyploid line by goldfish and common carp hybrids.","authors":"Kaizhuang Jing, Tingchu Wei, Xuedie Gu, Guoliang Lin, Lin Liu, Jing Luo","doi":"10.1093/bib/bbaf260","DOIUrl":"10.1093/bib/bbaf260","url":null,"abstract":"<p><p>During DNA transcription, the central dogma states that DNA generates corresponding RNA sequences based on the principle of complementary base pairing. However, in the allopolyploid line by goldfish and common carp hybrids, there is a significant level of transcriptional infidelity. To explore deeper into the causes of transcriptional infidelity in this line, we developed a deep learning model to explore its underlying determinants. First, our model can accurately identify transcriptional infidelity sequences at the nucleotide resolution and effectively distinguish transcriptional infidelity regions at the subregional level. Subsequently, we utilized this model to quantitatively assess the importance of position-specific motifs. Furthermore, by integrating the relationship between transcription factors and their recognition motifs, we unveiled the distribution of position-specific transcription factor families and classes that influence transcriptional infidelity in this line. In summary, our study provides new insights into the deeper determinants of transcriptional infidelity in this line.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12140016/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144233231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tae Hyun Kim, Harim Kim, Hyunjin Hwang, Shinwhan Kang, Kijung Shin, Inwha Baek
{"title":"Gene expression inference based on graph neural networks using L1000 data.","authors":"Tae Hyun Kim, Harim Kim, Hyunjin Hwang, Shinwhan Kang, Kijung Shin, Inwha Baek","doi":"10.1093/bib/bbaf273","DOIUrl":"10.1093/bib/bbaf273","url":null,"abstract":"<p><p>Gene expression profiles can serve as proxies for cellular states and provide valuable insights into the discovery of functional connections across diverse cellular contexts. A cost-effective method called L1000 has been developed to generate gene expression profiles for over a million different conditions. Since gene expression inference of this method relies on linear regression, nonlinear regression methods, including deep learning models, have been assessed. However, these approaches process gene expression data as a vector structure, motivating us to investigate whether nonlinear models based on a graph structure are more effective in capturing the relationships between genes underlying gene expression profiles. In this work, we show that the graph neural network (GNN) model with genes as nodes outperforms both linear and nonlinear non-GNN models in predicting gene expression values and expression-based gene rankings. Importantly, our GNN model requires ~10-fold less information than other models to achieve comparable performance. A strategic selection of input features, or incorporating an organ feature, from which the gene expression data are derived, further improves gene expression inference performance of the GNN model. Additionally, we evaluate the cross-platform generality of gene expression inference. Our study demonstrates that the transformation of RNA expression data into a graph structure effectively captures nonlinear correlations between genes, thereby enabling highly accurate and efficient prediction of gene expression profiles.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12161499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144282406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}