{"title":"MambaPhase: deep learning for liquid-liquid phase separation protein classification.","authors":"Jianwei Huang, Youli Zhang, Shulin Ren, Ziyang Wang, Xiaocheng Jin, Xiaoli Lu, Yu Zhang, Xiaoping Min, Shengxiang Ge, Jun Zhang, Ningshao Xia","doi":"10.1093/bib/bbaf230","DOIUrl":"10.1093/bib/bbaf230","url":null,"abstract":"<p><p>Liquid-liquid phase separation plays a critical role in cellular processes, including protein aggregation and RNA metabolism, by forming membraneless subcellular structures. Accurate identification of phase-separated proteins is essential for understanding and controlling these processes. Traditional identification methods are effective but often costly and time-consuming. The recent machine learning methods have reduced these costs, but most models are restricted to classifying scaffold and client proteins with limited experimental conditions. To address this limitation, we developed a Mamba-based encoder using contrastive learning that incorporates separation probability, protein type, and experimental conditions. Our model achieved 95.2% accuracy in predicting phase-separated proteins and an ROCAUC score of 0.87 in classifying scaffold and client proteins. Further validation in the DgHBP-2 drug delivery system demonstrated its potential for condition modulation in drug development. This study provides an effective framework for the accurate identification and control of phase separation, facilitating advancements in biomedical research and therapeutic applications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107247/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144149299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal geometric learning for antimicrobial peptide identification by leveraging alphafold2-predicted structures and surface features.","authors":"Zehua Sun, Jing Xu, Yumeng Zhang, Yiwen Zhang, Zhikang Wang, Xiaoyu Wang, Shanshan Li, Yuming Guo, Hsin Hui Shen, Jiangning Song","doi":"10.1093/bib/bbaf261","DOIUrl":"10.1093/bib/bbaf261","url":null,"abstract":"<p><p>Antimicrobial peptides (AMPs) are short peptides that play critical roles in diverse biological processes and exhibit functional activities against target organisms. While numerous methods have demonstrated the effectiveness of deep neural networks for AMP identification using sequence features; nevertheless, higher-level peptide characteristics-such as 3D structure and geometric surface features-have not been comprehensively explored. To address this gap, we introduce the SSFGM-Model (Sequence, Structure, Surface, Graph, and Geometric-based Model), a novel framework that integrates multiple feature types to enhance AMP identification. The model represents each peptide sequence as a graph, where nodes are characterized by amino acid features derived from ProteinBERT, ESM-2, and One-hot embeddings. Graph convolutional networks and an attention mechanism are employed to capture high-order structural and sequential relationships. Additionally, surface geometry and physicochemical properties are processed using a geometric neural network. Finally, a feature fusion strategy combines the outputs from these subnetworks to enable robust AMP identification. Extensive benchmarking experiments demonstrate that the SSFGM-Model outperforms current state-of-the-art methods. An ablation study further confirms the critical role of sequence, structural, and surface features in AMP identification. The key contribution of this work is the innovative integration of multiple levels of peptide characteristics and the combination of geometric and graph neural networks. This approach provides a more comprehensive understanding of the sequence-structure-function relationship of peptides, paving the way for more accurate AMP prediction. The SSFGM-Model has a significant potential for applications in the discovery and design of novel AMP-based therapeutics. The source code is publicly available at https://github.com/ggcameronnogg/SSFGM-Model.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144214977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shangwei Guo, Shengming Zhou, Guohua Wang, Fang Wang
{"title":"SCPline: An interactive framework for the single-cell proteomics data preprocessing.","authors":"Shangwei Guo, Shengming Zhou, Guohua Wang, Fang Wang","doi":"10.1093/bib/bbaf256","DOIUrl":"10.1093/bib/bbaf256","url":null,"abstract":"<p><p>Single-cell proteomics has advanced our understanding of cellular complexity by enabling detailed analysis of protein expression at the single-cell level. However, challenges such as data sparsity, variability, and noise require sophisticated computational solutions. SCPline addresses these by offering a comprehensive data preprocessing and analysis platform specifically for single-cell proteomics. It supports mass spectrometry-based, antibody-based, and multi-omics approaches, performing quality screening, normalization, dimensionality reduction, and clustering for each data type (https://bioinform.nefu.edu.cn/ScPline/). Each module includes tailored functions and visualizations for easy quality checks, allowing researchers with limited programming experience to efficiently preprocess data. By streamlining complex workflows, SCPline makes advanced computational tools accessible, enabling researchers to explore cellular heterogeneity and biological states, thus accelerating discoveries in developmental biology, disease pathogenesis, and therapeutic responses. Additionally, SCPline enhances reproducibility and rigor in proteomics research, contributing to breakthroughs in understanding cellular behavior and identifying novel therapeutic targets, shaping the future of biomedical research and precision medicine.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144214979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"I-SVVS: integrative stochastic variational variable selection to explore joint patterns of multi-omics microbiome data.","authors":"Tung Dang, Yushiro Fuji, Kie Kumaishi, Erika Usui, Shungo Kobori, Takumi Sato, Megumi Narukawa, Yusuke Toda, Kengo Sakurai, Yuji Yamasaki, Hisashi Tsujimoto, Masami Yokota Hirai, Yasunori Ichihashi, Hiroyoshi Iwata","doi":"10.1093/bib/bbaf132","DOIUrl":"10.1093/bib/bbaf132","url":null,"abstract":"<p><p>High-dimensional multi-omics microbiome data play an important role in elucidating microbial community interactions with their hosts and environment in critical diseases and ecological changes. Although Bayesian clustering methods have recently been used for the integrated analysis of multi-omics data, no method designed to analyze multi-omics microbiome data has been proposed. In this study, we propose a novel framework called integrative stochastic variational variable selection (I-SVVS), which is an extension of stochastic variational variable selection for high-dimensional microbiome data. The I-SVVS approach addresses a specific Bayesian mixture model for each type of omics data, such as an infinite Dirichlet multinomial mixture model for microbiome data and an infinite Gaussian mixture model for metabolomic data. This approach is expected to reduce the computational time of the clustering process and improve the accuracy of the clustering results. Additionally, I-SVVS identifies a critical set of representative variables in multi-omics microbiome data. Three datasets from soybean, mice, and humans (each set integrated microbiome and metabolome) were used to demonstrate the potential of I-SVVS. The results indicate that I-SVVS achieved improved accuracy and faster computation compared to existing methods across all test datasets. It effectively identified key microbiome species and metabolites characterizing each cluster. For instance, the computational analysis of the soybean dataset, including 377 samples with 16 943 microbiome species and 265 metabolome features, was completed in 2.18 hours using I-SVVS, compared to 2.35 days with Clusternomics and 1.12 days with iClusterPlus. The software for this analysis, written in Python, is freely available at https://github.com/tungtokyo1108/I-SVVS.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12122083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144180721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulation-guided pan-cancer analysis identifies a novel regulator of CpG island hypermethylation heterogeneity.","authors":"Xianglin Zhang, Wei Zhang, Jinyi Zhang, Xiuhong Lyu, Haoran Pan, Tianwei Jia, Ting Wang, Xiaowo Wang, Haiyang Guo","doi":"10.1093/bib/bbaf252","DOIUrl":"10.1093/bib/bbaf252","url":null,"abstract":"<p><p>CpG island hypermethylation, a hallmark of cancer, exhibits substantial heterogeneity across tumors, presenting both opportunities and challenges for cancer diagnostics and therapeutics. While this heterogeneity offers potential for patient stratification to predict clinical outcomes and personalize treatments, it complicates the development of robust biomarkers for early detection. Understanding the mechanisms driving this heterogeneity is essential for advancing biomarker design. Here, simulation-based analyses demonstrate that tumor purity and the high prevalence of low epi-mutation samples significantly obscure the identification of negative, rather than positive, regulators of CpG island hypermethylation, limiting a comprehensive understanding of heterogeneity sources. By addressing these confounders, we identify impaired DNA methylation maintenance, as indicated by global hypomethylation levels, as the primary contributor to CpG island hypermethylation variability among known regulators. This finding is supported by integrative analyses of datasets from The Cancer Genome Atlas (TCGA) Pan-Cancer Atlas, Genomics of Drug Sensitivity in Cancer (GDSC1000) cancer cell lines, and epi-allele analyses of two independent whole-genome bisulfite sequencing cohorts, using a newly developed method, MeHist (https://github.com/vhang072/MeHist). Furthermore, we assess widely used hypermethylation biomarkers across ten cancer types and find that 65 out of 246 (26.4%) are significantly influenced by impaired methylation maintenance. Incorporating hypomethylation and hypermethylation markers improves the robustness of cancer detection, as validated across multiple plasma cell-free DNA datasets. In summary, our findings highlight the value of simulation-guided integrative analysis in mitigating confounding effects and identify impaired DNA methylation maintenance as a key regulator of CpG island hypermethylation heterogeneity.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12127147/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144198280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abrar Rahman Abir, Md Toki Tahmid, Rafiqul Islam Rayan, M Saifur Rahman
{"title":"DeepRNA-Twist: language-model-guided RNA torsion angle prediction with attention-inception network.","authors":"Abrar Rahman Abir, Md Toki Tahmid, Rafiqul Islam Rayan, M Saifur Rahman","doi":"10.1093/bib/bbaf199","DOIUrl":"10.1093/bib/bbaf199","url":null,"abstract":"<p><p>RNA torsion and pseudo-torsion angles are critical in determining the three-dimensional conformation of RNA molecules, which in turn governs their biological functions. However, current methods are limited by RNA's structural complexity as well as flexibility, with experimental techniques being costly and computational approaches struggling to capture the intricate sequence dependencies needed for accurate predictions. To address these challenges, we introduce DeepRNA-Twist, a novel deep learning framework designed to predict RNA torsion and pseudo-torsion angles directly from sequence. DeepRNA-Twist utilizes RNA language model embeddings, which provides rich, context-aware feature representations of RNA sequences. Additionally, it introduces 2A3IDC module (Attention Augmented Inception Inside Inception with Dilated CNN), combining inception networks with dilated convolutions and multi-head attention mechanism. The dilated convolutions capture long-range dependencies in the sequence without requiring a large number of parameters, while the multi-head attention mechanism enhances the model's ability to focus on both local and global structural features simultaneously. DeepRNA-Twist was rigorously evaluated on benchmark datasets, including RNA-Puzzles, CASP-RNA, and SPOT-RNA-1D, and demonstrated significant improvements over existing methods, achieving state-of-the-art accuracy. Source code is available at https://github.com/abrarrahmanabir/DeepRNA-Twist.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12047705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143971183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Keisuke Yamada, Kanta Suga, Naoko Abe, Koji Hashimoto, Susumu Tsutsumi, Masahito Inagaki, Fumitaka Hashiya, Hiroshi Abe, Michiaki Hamada
{"title":"Multi-objective computational optimization of human 5' UTR sequences.","authors":"Keisuke Yamada, Kanta Suga, Naoko Abe, Koji Hashimoto, Susumu Tsutsumi, Masahito Inagaki, Fumitaka Hashiya, Hiroshi Abe, Michiaki Hamada","doi":"10.1093/bib/bbaf225","DOIUrl":"10.1093/bib/bbaf225","url":null,"abstract":"<p><p>The computational design of messenger RNA (mRNA) sequences is a critical technology for both scientific research and industrial applications. Recent advances in prediction and optimization models have enabled the automatic scoring and optimization of $5^prime $ UTR sequences, key upstream elements of mRNA. However, fully automated design of $5^prime $ UTR sequences with more than two objective scores has not yet been explored. In this study, we present a computational pipeline that optimizes human $5^prime $ UTR sequences in a multi-objective framework, addressing up to four distinct and conflicting objectives. Our work represents an important advancement in the multi-objective computational design of mRNA sequences, paving the way for more sophisticated mRNA engineering.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12103902/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144141511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"De-motif sampling: an approach to decompose hierarchical motifs with applications in T cell recognition.","authors":"Xinyi Tang, Ran Liu","doi":"10.1093/bib/bbaf221","DOIUrl":"10.1093/bib/bbaf221","url":null,"abstract":"<p><p>T cell immune recognition requires the interactions among antigen peptides, Major Histocompatibility Complex (MHC) molecules, and T cell receptors (TCRs). While research into the interactions between MHC and peptides is well established, the specific preferences of TCRs for peptides remain less understood. This gap largely stems from the requirement that antigen peptides must be bound to MHC and presented on the cell surface prior to recognition by TCRs. Typically, motifs related to TCR recognition are influenced by MHC characteristics, limiting the direct identification of TCR-specific motifs. To address this challenge, this study introduces a Bayesian method designed to decompose hierarchical motifs independently of MHC constraints. This model, rigorously tested through comprehensive simulation experiments and applied to real data, establishes a clear hierarchical structure for motifs related to T cell recognition.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12082833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144076073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rhys Gillman, Matt A Field, Ulf Schmitz, Lionel Hebbard
{"title":"TARGET-SL: precision essential gene prediction using driver prioritisation and synthetic lethality.","authors":"Rhys Gillman, Matt A Field, Ulf Schmitz, Lionel Hebbard","doi":"10.1093/bib/bbaf255","DOIUrl":"10.1093/bib/bbaf255","url":null,"abstract":"<p><p>The ability to identify patient-specific vulnerabilities to guide cancer treatments is a vital area of research. However, predictive bioinformatics tools are difficult to translate into clinical applications due to a lack of in vitro and in vivo validation. While the increasing number of personalised driver prioritisation algorithms (PDPAs) report powerful patient-specific information, the results do not easily translate into treatment strategies. Critical in addressing this gap is the ability to meaningfully benchmark and validate PDPA predictions. To address this, we developed Tumour-specific Algorithm for Ranking GEnetic Targets via Synthetic Lethality (TARGET-SL), which utilises PDPA predictions to produce a ranked list of predicted essential genes that can be validated in vitro and in vivo. This framework employs a novel strategy to benchmark PDPAs, by comparing predictions with ground truth gene essentiality data from large-scale CRISPR-knockout and drug sensitivity screens. Importantly TARGET-SL identifies vulnerabilities that are more exclusive to individual tumours than predictions based on canonical driver genes. We further find that TARGET-SL is better at identifying sample-specific vulnerabilities than other similar tools.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12145226/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144246529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scATD: a high-throughput and interpretable framework for single-cell cancer drug resistance prediction and biomarker identification.","authors":"Murong Zhou, Zeyu Luo, Yu-Hang Yin, Qiaoming Liu, Guohua Wang, Yuming Zhao","doi":"10.1093/bib/bbaf268","DOIUrl":"10.1093/bib/bbaf268","url":null,"abstract":"<p><p>Transfer learning has been widely applied to drug sensitivity prediction based on single-cell RNA sequencing, leveraging knowledge from large datasets of cancer cell lines or other sources to improve the prediction of drug responses. However, previous studies require model fine-tuning for different patient single-cell datasets, limiting their ability to meet the clinical need for high-throughput rapid prediction. In this research, we introduce single-cell Adaptive Transfer and Distillation model (scATD), a transfer learning framework leveraging large language models for high-throughput drug sensitivity prediction. Based on different large language models (scFoundation and Geneformer) and transfer strategies, scATD includes three distinct sub-models: scATD-sf, scATD-gf, and scATD-sf-dist. scATD-sf and scATD-gf employs an important bidirectional style transfer to enable predictions for new patients without model parameter training. Additionally, scATD-sf-dist uses knowledge distillation from large models to enhance prediction performance, improve efficiency, and reduce resource requirements. Benchmarking across more diverse datasets demonstrates scATD's superior accuracy, generalization and efficiency. Besides, by rigorously selecting reference background samples for feature attribution algorithms, scATD also provides more meaningful insights into the relationship between gene expression and drug resistance mechanisms. Making scATD more interpretability for addressing critical challenges in precision oncology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12159290/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144274197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}