Peidong Zhang, Xingang Peng, Rong Han, Ting Chen, Jianzhu Ma
{"title":"Rag2Mol: structure-based drug design based on retrieval augmented generation.","authors":"Peidong Zhang, Xingang Peng, Rong Han, Ting Chen, Jianzhu Ma","doi":"10.1093/bib/bbaf265","DOIUrl":"10.1093/bib/bbaf265","url":null,"abstract":"<p><p>Artificial intelligence (AI) has brought tremendous progress to drug discovery, yet identifying hit and lead compounds with optimal physicochemical and pharmacological properties remains a significant challenge. Structure-based drug design (SBDD) has emerged as a promising paradigm, but the inherent data biases and ignorance of synthetic accessibility render SBDD models disconnected from practical drug discovery. In this work, we explore two methodologies, Rag2Mol-G and Rag2Mol-R, both based on retrieval-augmented generation to design small molecules to fit a 3D pocket. These two methods involve searching for similar small molecules that are purchasable in the database based on the generated ones or creating new molecules from those in the database that can fit into a 3D pocket. Experimental results demonstrate that Rag2Mol methods consistently produce drug candidates with superior binding affinities and drug-likeness. We find that Rag2Mol-R provides a broader coverage of the chemical landscapes and more precise targeting capability than advanced virtual screening models. Notably, both workflows identified promising inhibitors for the challenging target protein tyrosine phosphatases PTPN2, which was used to be considered undruggable and still lacks inhibitors that have completed full clinical trials. Our highly extensible framework can integrate diverse SBDD methods, marking a significant advancement in AI-driven SBDD.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12159289/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144274195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Amjad Nawaz, Igor Eduardovich Pamirsky, Kirill Sergeevich Golokhvast
{"title":"Response to letter to editor \"On 'Bioinformatics in Russia: history and present-day landscape' by M.A. Nawaz, I.E. Pamirsky, and K.S. Golokhvast\" by Mikhail Gelfand.","authors":"Muhammad Amjad Nawaz, Igor Eduardovich Pamirsky, Kirill Sergeevich Golokhvast","doi":"10.1093/bib/bbaf172","DOIUrl":"10.1093/bib/bbaf172","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12101726/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144131822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chrombus-XMBD: a graph convolution model predicting 3D-genome from chromatin features.","authors":"Yuanyuan Zeng, Zhiyu You, Jiayang Guo, Jialin Zhao, Ying Zhou, Jialiang Huang, Xiaowen Lyu, Longbiao Chen, Qiyuan Li","doi":"10.1093/bib/bbaf183","DOIUrl":"10.1093/bib/bbaf183","url":null,"abstract":"<p><p>The 3D conformation of the chromatin is crucial for transcriptional regulation. However, current experimental techniques for detecting the 3D structure of the genome are costly and limited to the biological conditions. Here, we described \"ChrombusXMBD,\" a graph convolution model capable of predicting chromatin interactions ab initio based on available chromatin features. Using dynamic edge convolution with multihead attention mechanism, Chrombus encodes the 2D-chromatin features into a learnable embedding space, thereby generating a genome-wide 3D-contactmap. In validation, Chrombus effectively recapitulated the topological associated domains, expression quantitative trait loci, and promoter/enhancer interactions. Especially, Chrombus outperforms existing algorithms in predicting chromatin interactions over 1-2 Mb, increasing prediction correlation by 11.8%-48.7%, and predicts long-range interactions over 2 Mb (Pearson's coefficient 0.243-0.582). Chrombus also exhibits strong generalizability across human and mouse-derived cell lines. Additionally, the parameters of Chrombus inform the biological mechanisms underlying cistrome. Our model provides a new, generalizable analytical tool for understanding the complex dynamics of chromatin interactions and the landscape of cis-regulation of gene expression.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12047703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143959162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lianchong Gao, Yujun Liu, Jiawei Zou, Fulan Deng, Zheqi Liu, Zhen Zhang, Xinran Zhao, Lei Chen, Henry H Y Tong, Yuan Ji, Huangying Le, Xin Zou, Jie Hao
{"title":"Deep scSTAR: leveraging deep learning for the extraction and enhancement of phenotype-associated features from single-cell RNA sequencing and spatial transcriptomics data.","authors":"Lianchong Gao, Yujun Liu, Jiawei Zou, Fulan Deng, Zheqi Liu, Zhen Zhang, Xinran Zhao, Lei Chen, Henry H Y Tong, Yuan Ji, Huangying Le, Xin Zou, Jie Hao","doi":"10.1093/bib/bbaf160","DOIUrl":"10.1093/bib/bbaf160","url":null,"abstract":"<p><p>Single-cell sequencing has advanced our understanding of cellular heterogeneity and disease pathology, offering insights into cellular behavior and immune mechanisms. However, extracting meaningful phenotype-related features is challenging due to noise, batch effects, and irrelevant biological signals. To address this, we introduce Deep scSTAR (DscSTAR), a deep learning-based tool designed to enhance phenotype-associated features. DscSTAR identified HSP+ FKBP4+ T cells in CD8+ T cells, which linked to immune dysfunction and resistance to immune checkpoint blockade in non-small cell lung cancer. It has also enhanced spatial transcriptomics analysis of renal cell carcinoma, revealing interactions between cancer cells, CD8+ T cells, and tumor-associated macrophages that may promote immune suppression and affect outcomes. In hepatocellular carcinoma, it highlighted the role of S100A12+ neutrophils and cancer-associated fibroblasts in forming tumor immune barriers and potentially contributing to immunotherapy resistance. These findings demonstrate DscSTAR's capacity to model and extract phenotype-specific information, advancing our understanding of disease mechanisms and therapy resistance.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12047704/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143959425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Suresh Pokharel, Kepha Barasa, Pawel Pratyush, Dukka B Kc
{"title":"PLM-DBPs: enhancing plant DNA-binding protein prediction by integrating sequence-based and structure-aware protein language models.","authors":"Suresh Pokharel, Kepha Barasa, Pawel Pratyush, Dukka B Kc","doi":"10.1093/bib/bbaf245","DOIUrl":"10.1093/bib/bbaf245","url":null,"abstract":"<p><p>DNA-binding proteins (DBPs) play a crucial role in gene regulation, development, and environmental responses across plants, animals, and microorganisms. Existing DBP prediction methods are largely limited to sequence information, whether through handcrafted features or sequence-based protein language models (PLMs), overlooking structural cues critical to protein function. In addition, most existing tools are trained for general DBP predictions, which are often not accurate for plant-specific DBPs due to the unique structural and functional properties of plant proteins. Our work introduces PLM-DBPs, a deep learning framework that integrates both sequence-based and structure-aware representations to enhance DBP prediction in plants. We evaluated several state-of-the-art PLMs to extract high-dimensional protein representations and experimented with various fusion strategies to validate the complementary information between the various representations. Our final model, a fusion of sequence-based and structure-aware ANN models, achieves a notable improvement in predicting DBPs in plants outperforming previous state-of-the-art models. Although sequence-based PLMs already demonstrate strong performance in DBP prediction, our findings show that the integration of structural information further enhances predictive accuracy. This underscores the complementary nature of structural representations and establishes PLM-DBPs as a robust tool for advancing plant research and agricultural innovation. The proposed model and other resources are publicly available at https://github.com/suresh-pokharel/PLM-DBPs.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121366/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sunghong Park, Dong-Gi Lee, Juhyeon Kim, Seung Ho Kim, Hyeon Jin Hwang, Hyunjung Shin, Hyun Goo Woo
{"title":"PPIxGPN: plasma proteomic profiling of neurodegenerative biomarkers with protein-protein interaction-based eXplainable graph propagational network.","authors":"Sunghong Park, Dong-Gi Lee, Juhyeon Kim, Seung Ho Kim, Hyeon Jin Hwang, Hyunjung Shin, Hyun Goo Woo","doi":"10.1093/bib/bbaf213","DOIUrl":"10.1093/bib/bbaf213","url":null,"abstract":"<p><p>Neurodegenerative diseases involve progressive neuronal dysfunction, requiring the identification of specific pathological features for accurate diagnosis. While cerebrospinal fluid analysis and neuroimaging are commonly used, their invasive nature and high costs limit clinical applicability. Recently advances in plasma proteomics offer a less invasive and cost-effective alternative, further enhanced by machine learning (ML). However, most ML-based studies overlook synergetic effects from protein-protein interactions (PPIs), which play a key role in disease mechanisms. Although graph convolutional network and its extensions can utilize PPIs, they rely on locality-based feature aggregation, overlooking essential components and emphasizing noisy interactions. Moreover, expanding those methods to cover broader PPIs results in complex model architectures that reduce explainability, which is crucial in medical ML models for clinical decision-making. To address these challenges, we propose Protein-Protein Interaction-based eXplainable Graph Propagational Network (PPIxGPN), a novel ML model designed for plasma proteomic profiling of neurodegenerative biomarkers. PPIxGPN captures synergetic effects between proteins by integrating PPIs with independent effects of proteins, leveraging globality-based feature aggregation to represent comprehensive PPI properties. This process is implemented using a single graph propagational layer, enabling PPIxGPN to be configured by shallow architecture, thereby PPIxGPN ensures high model explainability, enhancing clinical applicability by providing interpretable outputs. Experimental validation on the UK Biobank dataset demonstrated the superior performance of PPIxGPN in neurodegenerative risk prediction, outperforming comparison methods. Furthermore, the explainability of PPIxGPN facilitated detailed analyses of the discriminative significance of synergistic effects, the predictive importance of proteins, and the longitudinal changes in biomarker profiles, highlighting its clinical relevance.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121361/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ProtPhage: a deep learning framework for phage viral protein identification and functional annotation.","authors":"Yuehua Ou, Qiyi Chen, Ningyu Zhong, Zhihua Du","doi":"10.1093/bib/bbaf285","DOIUrl":"10.1093/bib/bbaf285","url":null,"abstract":"<p><p>Phages, viruses that infect bacteria, offer a promising strategy against antibiotic-resistant pathogens. Phage viral proteins (PVPs) are essential for phage-host interactions, yet their identification and functional annotation remain challenging due to high sequence diversity, limited experimental data, and class imbalance. To address these issues, we propose ProtPhage, a novel framework that leverages the ProtT5 protein language model for richer sequence representation compared to traditional methods. Additionally, ProtPhage incorporates an asymmetric loss function to mitigate class imbalance, significantly improving the prediction of the minority class \"minor capsid,\" with an F1 score 33.07$%$ higher than the best existing model. Extensive experiments demonstrate that ProtPhage outperforms current methods across multiple metrics, including accuracy, precision, recall, and F1 score. A case study on the Mycobacterium phage PDRPxv genome further validates its practical utility, while expanded experiments highlight its potential in phage-host prediction. By integrating advanced deep learning techniques, ProtPhage establishes a new standard for PVP identification and annotation, contributing to the broader field of computational phage biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165830/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144293341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Stolnicu, Nensi Ikonomi, Peter Eckhardt-Bellmann, Johann M Kraus, Hans A Kestler
{"title":"Robust signalling entropy estimation for biological process characterisation.","authors":"Ana Stolnicu, Nensi Ikonomi, Peter Eckhardt-Bellmann, Johann M Kraus, Hans A Kestler","doi":"10.1093/bib/bbaf269","DOIUrl":"https://doi.org/10.1093/bib/bbaf269","url":null,"abstract":"<p><strong>Motivation: </strong>Signalling entropy measures the uncertainty or randomness in the signalling pathways of a biological system. It reflects the complexity and variability of protein interactions and can indicate how information is processed within cells. Higher signalling entropy often indicates a more dynamic and adaptive state, whereas lower entropy may imply a more stable and less responsive condition. Estimating signalling entropy has become a valuable method for studying and understanding the complexity of biological processes. This measure has the potential to shed valuable insights into various phenomena, including the mechanisms behind cell fate decisions, drug resistance, and disease progression. To examine the molecular changes within a system, signalling entropy is quantified through the integration of expression measurements and protein interaction networks. Experimental and computational issues, such as false positives and additional noise, can all compromise the accuracy of protein interaction networks. Correction methods can be used to mitigate spurious results, correct for experimental bias, and integrate data from multiple sources. However, to date, the effect of such approaches on entropy calculations, together with the impact of different underlying networks, has yet to be evaluated.</p><p><strong>Results: </strong>Here, we investigate how the topology of distinct protein interaction networks can alter the entropy calculation. We examine the entropy derived from different protein interaction networks. Additionally, we systematically evaluate different correction strategies, outlining their benefits and drawbacks along with identifying the most effective approaches for specific types of data and biological scenarios. This protocol outlines how to optimize the reliability of entropy calculations and ultimately leads to a deeper comprehension of biological processes and disease mechanisms.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144324527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VirNucPro: an identifier for the identification of viral short sequences using six-frame translation and large language models.","authors":"Jing Li, Jia Mi, Wei Lin, Fengjuan Tian, Jing Wan, Jingyang Gao, Yigang Tong","doi":"10.1093/bib/bbaf224","DOIUrl":"10.1093/bib/bbaf224","url":null,"abstract":"<p><p>Viruses are ubiquitous in nature, yet our understanding of them remains limited. High-throughput sequencing technology facilitates the unbiased revelation of genetic composition in samples; however, viral sequences typically make up a small proportion of the entire sequencing data, making it challenging to accurately identify the few or fragmented viral sequences present in a sample. The limited features and information provided by short sequences result in insufficient resolution of viral sequences by existing models. Therefore, we propose a new model, VirNucPro, for short viral sequence identification. Based on a six-frame translation strategy and large language models, we combine nucleotide and amino acid sequence information to enhance feature extraction for short sequences, achieving high accuracy in identifying short viral sequences. Ablation experiments compared the contributions of nucleotide and amino acid sequence features to the model, confirming that the introduced amino acid features significantly contribute to the classification results. Our model outperforms others, such as GCNFrame, DeepVirFinder, DETIRE, and Virtifier, which have demonstrated good performance in identifying short viral sequences of 300 and 500 bp. Our model demonstrates excellent performance on carefully created real-world datasets. Additionally, it can scan for prophage regions within long bacterial fragments, offering a wide range of applications. The codes are available at: https://github.com/Li-Jing-1997/VirNucPro.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12086996/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144092778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advances and critical aspects in cancer treatment development using digital twins.","authors":"Rym Bouriga, Caroline Bailleux, Jocelyn Gal, Emmanuel Chamorey, Baharia Mograbi, Jean-Michel Hannoun-Levi, Gerard Milano","doi":"10.1093/bib/bbaf237","DOIUrl":"10.1093/bib/bbaf237","url":null,"abstract":"<p><p>The emergence of digital twins (DTs) in the arena of anticancer treatment echoes the transformative impact of artificial intelligence in drug development. DTs provide dynamic, accessible platforms that may accurately replicate patient and tumor characteristics. The potential of DTs in clinical investigation is particularly compelling. By comparing data from virtual trials with conventional trial results, medical teams can significantly enhance the reliability of their studies. Moreover, a significant breakthrough in clinical research is the ability of DT to augment patient data during ongoing trials, enabling adaptive trial designs and more robust statistical analyses to be performed even with limited patient populations. The development of DTs faces however several technical and methodological challenges. These include their tendency to produce unreliable predictions, non-factual information, reasoning errors, systematic biases, and a lack of interpretability. Future research in this field should focus on an interdisciplinary approach that brings together experts from diverse fields, including mathematicians, biologists, and physicians. This collaborative strategy promises to unlock new frontiers in personalized cancer treatment and medical methodologies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12130972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144207735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}