{"title":"Computational methods and data resources for predicting tumor neoantigens.","authors":"Xiaofei Zhao, Lei Wei, Xuegong Zhang","doi":"10.1093/bib/bbaf302","DOIUrl":"10.1093/bib/bbaf302","url":null,"abstract":"<p><p>Neoantigens are tumor-specific antigens presented exclusively by cancer cells. These antigens are recognized as nonself by the host immune system, thereby eliciting an antitumor T-cell response. This response is significantly enhanced through neoantigen-based immunotherapies, such as personalized cancer vaccines. The repertoire of neoantigens is unique to each cancer patient, necessitating neoantigen prediction for designing patient-specific immunotherapies. This review presents the computational methods and data resources used for neoantigen prediction, as well as the prediction-associated challenges. Neoantigen prediction typically uses human leukocyte antigen typing, RNA-seq transcript quantification, somatic variant calling, peptide-major histocompatibility complex (pMHC) presentation prediction, and pMHC recognition prediction as the main computational steps. The immunoinformatics tools used for these steps and for the overall prediction of neoantigens are systematically summarized and detailed in this review.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12222050/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144552339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DrugProtAI: A machine learning-driven approach for predicting protein druggability through feature engineering and robust partition-based ensemble methods.","authors":"Ankit Halder, Sabyasachi Samantaray, Sahil Barbade, Aditya Gupta, Sanjeeva Srivastava","doi":"10.1093/bib/bbaf330","DOIUrl":"10.1093/bib/bbaf330","url":null,"abstract":"<p><p>Drug design and development are central to clinical research, yet 90% of drugs fail to reach the clinic, often due to inappropriate selection of drug targets. Conventional methods for target identification lack precision and sensitivity. While various computational tools have been developed to predict the druggability of proteins, they often focus on limited subsets of the human proteome or rely solely on amino acid properties. Our study presents DrugProtAI, a tool developed by implementing a partitioning-based method and trained on the entire human protein set using both sequence- and non-sequence-derived properties. The partitioned method was evaluated using popular machine learning algorithms, of which Random Forest and XGBoost performed the best. A comprehensive analysis of 183 features, encompassing biophysical, sequence-, and non-sequence-derived properties, achieved a median Area Under Precision-Recall Curve (AUC) of 0.87 in target prediction. The model was further tested on a blinded validation set comprising recently approved drug targets. The key predictors were also identified, which we believe will help users in selecting appropriate drug targets. We believe that these insights are poised to significantly advance drug development. This version of the tool provides the probability of druggability for human proteins. The tool is freely accessible at https://drugprotai.pythonanywhere.com/.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12236430/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144590475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert R Jenq, Christine B Peterson
{"title":"CAT: a conditional association test for microbiome data using a permutation approach.","authors":"Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert R Jenq, Christine B Peterson","doi":"10.1093/bib/bbaf326","DOIUrl":"https://doi.org/10.1093/bib/bbaf326","url":null,"abstract":"<p><p>In microbiome analysis, researchers often seek to identify taxonomic features associated with an outcome of interest. However, microbiome features are intercorrelated and linked by phylogenetic relationships, making it challenging to assess the association between an individual feature and an outcome. This paper proposes a novel conditional association test, CAT, that can account for other features and phylogenetic relatedness when testing the association between a feature and an outcome. CAT adopts a permutation approach, measuring the importance of a feature in predicting the outcome by permuting operational taxonomic unit/amplicon sequence variant counts belonging to that feature from the data and quantifying how much the association with the outcome is weakened through the change in the coefficient of determination $R^{2}$. Compared with marginal association tests, it focuses on the added value of a feature in explaining outcome variation that is not captured by other features. By leveraging global tests including PERMANOVA and MiRKAT-based methods, CAT allows association testing for continuous, binary, categorical, count, survival, and correlated outcomes. We demonstrate through simulation studies that CAT can provide a direct quantification of feature importance that is distinct from that of marginal association tests, and illustrate CAT with applications to two real-world studies on the microbiome in melanoma patients: one examining the role of the microbiome in shaping immunotherapy response, and one investigating the association between the microbiome and survival outcomes. Our results illustrate the potential of CAT to inform the design of microbiome interventions aimed at improving clinical outcomes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144607395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maribel Pérez-Ribera, Muhammad Faizan-Khan, Roger Giné, Josep M Badia, Alexandra Junza, Oscar Yanes, Marta Sales-Pardo, Roger Guimerà
{"title":"SingleFrag: a deep learning tool for MS/MS fragment and spectral prediction and metabolite annotation.","authors":"Maribel Pérez-Ribera, Muhammad Faizan-Khan, Roger Giné, Josep M Badia, Alexandra Junza, Oscar Yanes, Marta Sales-Pardo, Roger Guimerà","doi":"10.1093/bib/bbaf333","DOIUrl":"10.1093/bib/bbaf333","url":null,"abstract":"<p><p>Metabolite and small molecule identification via tandem mass spectrometry (MS/MS) involves matching experimental spectra with prerecorded spectra of known compounds. This process is hindered by the current lack of comprehensive reference spectral libraries. To address this gap, we need accurate in silico fragmentation tools for predicting MS/MS spectra of compounds for which empirical spectra do not exist. Here, we present SingleFrag, a novel deep learning tool that predicts individual fragments separately, rather than attempting to predict the entire fragmentation spectrum at once. Our results demonstrate that SingleFrag surpasses state-of-the-art in silico fragmentation tools, providing a powerful method for annotating unknown MS/MS spectra of known compounds. As a proof of concept, we successfully annotate three previously unidentified compounds frequently found in human samples.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12245663/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144607397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiani Ma, Zhikang Wang, Cen Tong, Qi Yang, Lin Zhang, Hui Liu
{"title":"pMHChat, characterizing the interactions between major histocompatibility complex class II molecules and peptides with large language models and deep hypergraph learning.","authors":"Jiani Ma, Zhikang Wang, Cen Tong, Qi Yang, Lin Zhang, Hui Liu","doi":"10.1093/bib/bbaf321","DOIUrl":"10.1093/bib/bbaf321","url":null,"abstract":"<p><p>Characterizing the binding interactions between major histocompatibility complex (MHC) class II molecules and peptides is crucial for studying the immune system, offering potential applications for neoantigen design, vaccine development, and personalized immunotherapy. Motivated by this profound meaning, we developed a model that integrates large language models (LLMs) and deep hypergraph learning for predicting MHC class II-peptide binding reactivity, affinity, and residue contact profiling. pMHChat takes MHC pseudo-sequences and peptide sequences as inputs and processes them through four stages: LLMs fine-tune stage, feature encoding and map fusion stage, task-specific prediction stage, and downstream analysis stage. pMHChat distinguishes itself in capturing contextually relevant and high-order spatial interactions of the peptide-MHC (pMHC) complex. Specifically, in a five-fold cross-validation experiment, pMHChat achieves superior performance, with a mean area under the receiver operating characteristic curve of 0.8744 and an area under the precision-recall curve of 0.8390 in the binding reactivity task, as well as a mean Pearson correlation coefficient of 0.7311 in the binding affinity prediction task. Furthermore, pMHChat also demonstrates the best performance in both the leave-one-molecule-out setting and independent evaluation. Notably, pMHChat can provide residue contact profiling, showing its potential application in recognizing critical binding patterns of the pMHC complex. Our findings highlight pMHChat's capacity to advance both predictive accuracy and detailed insights into the MHC-peptide binding process. We anticipate that pMHChat will serve as a powerful tool for elucidating MHC-peptide interactions, with promising applications in immunological research and therapeutic development.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12229989/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144574847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A free energy perturbation-assisted machine learning strategy for mimotope screening in neoantigen-based vaccine design.","authors":"Qinglu Zhong, Kevin C Chan, Lei Fu, Ruhong Zhou","doi":"10.1093/bib/bbaf254","DOIUrl":"10.1093/bib/bbaf254","url":null,"abstract":"<p><p>Neoantigen-based immunotherapy has emerged as a promising approach for cancer treatment. One key strategy in neoantigen-based vaccine design is to alter known neoantigens into enhanced mimotopes that elicit more robust immune responses. However, screening mimotopes presents challenges in both diversity and precision. While machine learning (ML) models facilitate high-throughput screening of immunogenic candidates, they struggle to distinguish mimotopes from original neoantigens (i.e. identify mimotopes with higher binding affinities, rather than solely distinguish between binding and nonbinding peptides). In contrast, alchemical methods such as free energy perturbation (FEP) provide quantitative binding free-energy differences between mimotopes and neoantigens but are computationally intensive. To leverage the strengths of both approaches, we propose an FEP-assisted ML (FEPaML) strategy that employs Bayesian optimization to iteratively refine knowledge-based predictions with physics-based evaluations, thereby progressively achieving locally optimized, precise, and robust outcomes. Our FEPaML strategy is then applied to screen mimotopes for several representative neoantigens. It has demonstrated excellent predictive precisions (exceeding 0.9) with a relatively small number of FEP samplings, significantly outperforming existing ML models.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12240735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144599483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AttenRNA: multi-scale deep attentive model with RNA feature variability analysis.","authors":"Jing Li, Quan Zou, Chao Zhan","doi":"10.1093/bib/bbaf336","DOIUrl":"10.1093/bib/bbaf336","url":null,"abstract":"<p><p>Accurate identification of diverse RNA types, including messenger RNAs (mRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), is essential for understanding their roles in gene regulation, disease progression, and epigenetic modification. Existing studies have primarily focused on binary classification tasks, such as distinguishing lncRNAs from mRNAs or identifying specific circRNAs, often overlooking the complex sequence patterns shared across multiple RNA types. To address this limitation, we developed AttenRNA, a multi-class classification model that integrates multi-scale k-mer embeddings and attention mechanisms to simultaneously differentiate between various RNA classes. AttenRNA achieved high weighted F1 scores of 89.8% and 89.6% on the validation and test sets, respectively, demonstrating strong classification performance and robustness. Dimensionality reduction using Uniform Manifold Approximation and Projection further confirmed the model's ability to learn discriminative features among RNA types. Additionally, AttenRNA exhibited strong generalization ability on cross-species data, achieving weighted F1 scores of 83.89% and 83.38% on the mouse RNA validation and test sets, respectively. These results suggest that AttenRNA offers a reliable and scalable solution for systematic RNA function analysis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12240734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144599485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive survey and benchmark of deep learning-based methods for atomic model building from cryo-electron microscopy density maps.","authors":"Chenwei Zhang, Anne Condon, Khanh Dao Duc","doi":"10.1093/bib/bbaf322","DOIUrl":"10.1093/bib/bbaf322","url":null,"abstract":"<p><p>Advancements in deep learning (DL) have recently led to new methods for automated construction of atomic models of proteins, from single-particle cryogenic electron microscopy (cryo-EM) density maps. We conduct a comprehensive survey of these methods, distinguishing between direct model building approaches that only use density maps, and indirect ones that integrate sequence-to-structure predictions from AlphaFold. To evaluate them with better precision, we refine standard existing metrics, and benchmark a subset of representative DL-methods against traditional physics-based approaches using 50 cryo-EM density maps at varying resolutions. Our findings demonstrate that overall, DL-based methods outperform traditional physics-based methods. Our benchmark also shows the benefit of integrating AlphaFold as it improved the completeness and accuracy of the model, although its dependency on available sequence information and limited training data may limit its usage.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12253957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144616277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A roadmap for T cell receptor-peptide-bound major histocompatibility complex binding prediction by machine learning: glimpse and foresight.","authors":"Furong Qi, Qiang Huang, Yao Xuan, Yingyin Cao, Yunyun Shen, Yihan Ren, Zhe Liu, Zheng Zhang","doi":"10.1093/bib/bbaf327","DOIUrl":"https://doi.org/10.1093/bib/bbaf327","url":null,"abstract":"<p><p>Cytotoxic T lymphocytes (CTLs) play a key role in the defense of cancer and infectious diseases. CTLs are mainly activated by T cell receptors (TCRs) after recognizing the peptide-bound class I major histocompatibility complex, and subsequently kill virus-infected cells and tumor cells. Therefore, identification of antigen-specific CTLs and their TCRs is a promising agent for T-cell based intervention. Currently, the experimental identification and validation of antigen-specific CTLs is well-used but extremely resource-intensive. The machine learning methods for TCR-pMHC prediction are growing interest particularly with advances in single-cell technologies. This review clarifies the key biological processes involved in TCR-pMHC binding. After comprehensively comparing the advantages and disadvantages of several state-of-the-art machine learning algorithms for TCR-pMHC prediction, we point out the discrepancies with these machine learning methods under specific disease conditions. Finally, we proposed a roadmap of TCR-pMHC prediction. This roadmap would enable more accurate TCR-pMHC binding prediction when improving data quality, encoding and embedding methods, training models, and application context. This review could facilitate the development of T-cell based vaccines and therapy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144625385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Sindeeva, Alexander Telepov, Nikita Ivanisenko, Tatiana Shashkova, Kuzma Khrabrov, Artem Tsypin, Artur Kadurin, Olga Kardymon
{"title":"AFToolkit: a framework for molecular modeling of proteins with AlphaFold-derived representations.","authors":"Maria Sindeeva, Alexander Telepov, Nikita Ivanisenko, Tatiana Shashkova, Kuzma Khrabrov, Artem Tsypin, Artur Kadurin, Olga Kardymon","doi":"10.1093/bib/bbaf324","DOIUrl":"https://doi.org/10.1093/bib/bbaf324","url":null,"abstract":"<p><p>A key challenge in protein engineering is understanding how mutations affect protein fitness and stability. Most of current state-of-the-art models fine-tune protein structure prediction or protein language models or even pretrain their own. Despite its widespread use within computational workflows, AlphaFold2 exhibits limited sensitivity in assessing the effects of amino acid point mutations on protein structure, thereby constraining its utility in sequence design and protein engineering. In this work, we propose a simple modification of AlphaFold2 inference that improves the model's capacity to capture the structural impacts of amino acid mutations. We achieve this by discarding the multiple sequence alignment and masking the template in recycling stages. Moreover, we introduce AFToolkit, a framework that leverages the embeddings of the modified AlphaFold2 model and simple adapter models to solve multiple protein engineering tasks. In contrast to other methods, our approach does not require fine-tuning the AlphaFold2 model or pretraining a new model from scratch on large datasets. It also supports handling multiple mutations, insertions, and deletions by directly modifying the input protein sequence. The proposed approach achieves strong performance across established benchmarks in terms of Spearman correlation: $0.68$ on PTMul, $0.60$ on cDNA-indel, and $0.57$ on C380.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144574840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}