Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
SimSon: simple contrastive learning of SMILES for molecular property prediction. SimSon:用于分子性质预测的smile简单对比学习。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf275
Chae Eun Lee, Jin Sob Kim, Jin Hong Min, Sung Won Han
{"title":"SimSon: simple contrastive learning of SMILES for molecular property prediction.","authors":"Chae Eun Lee, Jin Sob Kim, Jin Hong Min, Sung Won Han","doi":"10.1093/bioinformatics/btaf275","DOIUrl":"10.1093/bioinformatics/btaf275","url":null,"abstract":"<p><strong>Motivation: </strong>Molecular property prediction with deep learning has accelerated drug discovery and retrosynthesis. However, the shortage of labeled molecular data and the challenge of generalizing across the vast chemical spaces pose significant hurdles for leveraging deep learning in molecular property prediction. This study proposes a self-supervised framework designed to acquire a Simplified Molecular Input Line Entry System (SMILES) representation, which we have dubbed Simple SMILES contrastive learning (SimSon). SimSon was pre-trained using unlabeled SMILES data through contrastive learning to grasp the SMILES representations.</p><p><strong>Results: </strong>Our findings demonstrate that contrastive learning with randomized SMILES enriches the ability of the model to generalize and its robustness as it captures the global semantic context at the molecular level. In downstream tasks, SimSon performs competitively when compared to graph-based methods and even outperforms them on certain benchmark datasets. These results indicate that SimSon effectively captures structural information from SMILES, exhibiting remarkable generalization and robustness. The potential applications of SimSon extend to bioinformatics and cheminformatics, encompassing areas such as drug discovery and drug-drug interaction prediction.</p><p><strong>Availability and implementation: </strong>The source code is available at https://github.com/lee00206/SimSon.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12124188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Missense variants pathogenicity annotation from homologous proteins. 同源蛋白的错义变异致病性注释。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf305
Gabriel Ruiz-Alías, Sergi Soldevila, Xavier Altafaj, Arnau Cordomí, Mireia Olivella
{"title":"Missense variants pathogenicity annotation from homologous proteins.","authors":"Gabriel Ruiz-Alías, Sergi Soldevila, Xavier Altafaj, Arnau Cordomí, Mireia Olivella","doi":"10.1093/bioinformatics/btaf305","DOIUrl":"10.1093/bioinformatics/btaf305","url":null,"abstract":"<p><strong>Motivation: </strong>High-throughput DNA sequencing has revealed millions of single nucleotide variants (SNVs) in the human genome, with a small fraction linked to disease. The effect of missense variants, which alter the protein sequence, is particularly challenging to interpret due to the scarcity of clinical annotations and experimental information. While using conservation and structural information, current prediction tools still struggle to predict variant pathogenicity. In this study, we explored the pathogenicity of homologous missense variants-variants in equivalent positions across homologous proteins-focusing on proteins involved in autosomal dominant diseases.</p><p><strong>Results: </strong>Our analysis of 2976 pathogenic and 17 555 non-pathogenic homologous variants demonstrated that pathogenicity can be extrapolated with 95% accuracy within a family, or up to 98% for closer homologs. Remarkably, the evaluation of 27 commonly used mutation predictor methods revealed that they were not fully capturing this biological feature. To facilitate the exploration of homologous variants, we created HomolVar, a web server that computationally predicts the pathogenesis of missense variants using annotations from homologous variants, freely available at https://rarevariants.org/HomolVar. Overall, these findings and the accompanying tool offer a robust method for predicting the pathogenicity of unannotated variants, enhancing genotype-phenotype correlations, and contributing to diagnosing rare genetic disorders.</p><p><strong>Availability and implementation: </strong>HomolVar is freely available at https://rarevariants.org/HomolVar.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12122210/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144061733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topology-based metrics for finding the optimal sparsity in gene regulatory network inference. 基因调控网络推理中最优稀疏度的拓扑度量。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf120
Nils Lundqvist, Mateusz Garbulowski, Thomas Hillerton, Erik L L Sonnhammer
{"title":"Topology-based metrics for finding the optimal sparsity in gene regulatory network inference.","authors":"Nils Lundqvist, Mateusz Garbulowski, Thomas Hillerton, Erik L L Sonnhammer","doi":"10.1093/bioinformatics/btaf120","DOIUrl":"10.1093/bioinformatics/btaf120","url":null,"abstract":"<p><strong>Motivation: </strong>Gene regulatory network (GRN) inference is a complex task aiming to unravel regulatory interactions between genes in a cell. A major shortcoming of most GRN inference methods is that they do not attempt to find the optimal sparsity, i.e. the single best GRN, which is important when applying GRN inference in a real situation. Instead, the sparsity tends to be controlled by an arbitrarily set hyperparameter.</p><p><strong>Results: </strong>In this paper, two new methods for predicting the optimal sparsity of GRNs are formulated and benchmarked on simulated perturbation-based gene expression data using four GRN inference methods: LASSO, Zscore, LSCON, and GENIE3. Both sparsity prediction methods are defined using the hypothesis that the topology of real GRNs is scale-free, and are evaluated based on their ability to predict the sparsity of the true GRN. The results show that the new topology-based approaches reliably predict a sparsity close to the true one. This ability is valuable for real-world applications where a single GRN is inferred from real data. In such situations, it is vital to be able to infer a GRN with the correct sparsity.</p><p><strong>Availability and implementation: </strong>https://bitbucket.org/sonnhammergrni/powerlaw_sparsity/ and https://codeocean.com/capsule/4393635/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057811/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143702527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPIPDLF: a pretrained deep learning framework for predicting enhancer-promoter interactions. EPIPDLF:用于预测增强子-启动子相互作用的预训练深度学习框架。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btae716
Zhichao Xiao, Yan Li, Yijie Ding, Liang Yu
{"title":"EPIPDLF: a pretrained deep learning framework for predicting enhancer-promoter interactions.","authors":"Zhichao Xiao, Yan Li, Yijie Ding, Liang Yu","doi":"10.1093/bioinformatics/btae716","DOIUrl":"10.1093/bioinformatics/btae716","url":null,"abstract":"<p><strong>Motivation: </strong>Enhancers and promoters, as regulatory DNA elements, play pivotal roles in gene expression, homeostasis, and disease development across various biological processes. With advancing research, it has been uncovered that distal enhancers may engage with nearby promoters to modulate the expression of target genes. This discovery holds significant implications for deepening our comprehension of various biological mechanisms. In recent years, numerous high-throughput wet-lab techniques have been created to detect possible interactions between enhancers and promoters. However, these experimental methods are often time-intensive and costly.</p><p><strong>Results: </strong>To tackle this issue, we have created an innovative deep learning approach, EPIPDLF, which utilizes advanced deep learning techniques to predict EPIs based solely on genomic sequences in an interpretable manner. Comparative evaluations across six benchmark datasets demonstrate that EPIPDLF consistently exhibits superior performance in EPI prediction. Additionally, by incorporating interpretable analysis mechanisms, our model enables the elucidation of learned features, aiding in the identification and biological analysis of important sequences.</p><p><strong>Availability and implementation: </strong>The source code and data are available at: https://github.com/xzc196/EPIPDLF.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12057809/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic biomarker discovery and enrichment with BRAD. 利用BRAD自动发现和富集生物标志物。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf159
Joshua Pickard, Ram Prakash, Marc Andrew Choi, Natalie Oliven, Cooper Stansbury, Jillian Cwycyshyn, Nicholas Galioto, Alex Gorodetsky, Alvaro Velasquez, Indika Rajapakse
{"title":"Automatic biomarker discovery and enrichment with BRAD.","authors":"Joshua Pickard, Ram Prakash, Marc Andrew Choi, Natalie Oliven, Cooper Stansbury, Jillian Cwycyshyn, Nicholas Galioto, Alex Gorodetsky, Alvaro Velasquez, Indika Rajapakse","doi":"10.1093/bioinformatics/btaf159","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf159","url":null,"abstract":"<p><strong>Motivation: </strong>Integrating Large Language Models (LLMs) with research tools presents technical and reproducibility challenges for biomedical research. While commercial artificial intelligence (AI) systems are easy to adopt, they obscure data provenance, lack transparency, and can generates false information, making them unfit for many research problems. To address these challenges, we developed the Bioinformatics Retrieval Augmented Digital (BRAD) agent software system.</p><p><strong>Results: </strong>Here, we introduce BRAD, an agentic system that integrates LLMs with external tools and data to streamline research workflows. BRAD's modular agents retrieve information from literature, custom software, and online databases while maintaining transparent protocols to increase the reliability of AI generated results. We apply BRAD to a biomarker discovery pipeline, automating both execution and the generation of enrichment reports. This workflow contextualizes user data within the literature, enabling a level of interpretation and automation that surpasses conventional research tools. Beyond the workflow we highlight here, BRAD is a flexible system that has been deployed in other applications including a chatbot, video RAG, and analysis of single cell data.</p><p><strong>Availability and implementation: </strong>The source code for BRAD is available at https://github.com/Jpickard1/BRAD; Information for pip installation, tutorials, documentation, and further information can be found at: ReadTheDocs.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12064167/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144045940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProtNote: a multimodal method for protein-function annotation. ProtNote:蛋白质功能注释的多模态方法。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf170
Samir Char, Nathaniel Corley, Sarah Alamdari, Kevin K Yang, Ava P Amini
{"title":"ProtNote: a multimodal method for protein-function annotation.","authors":"Samir Char, Nathaniel Corley, Sarah Alamdari, Kevin K Yang, Ava P Amini","doi":"10.1093/bioinformatics/btaf170","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf170","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding the protein sequence-function relationship is essential for advancing protein biology and engineering. However, <1% of known protein sequences have human-verified functions. While deep-learning methods have demonstrated promise for protein-function prediction, current models are limited to predicting only those functions on which they were trained.</p><p><strong>Results: </strong>Here, we introduce ProtNote, a multimodal deep-learning model that leverages free-form text to enable both supervised and zero-shot protein-function prediction. ProtNote not only maintains near state-of-the-art performance for annotations in its training set but also generalizes to unseen and novel functions in zero-shot test settings. ProtNote demonstrates superior performance in the prediction of novel Gene Ontology annotations and Enzyme Commission numbers compared to baseline models by capturing nuanced sequence-function relationships that unlock a range of biological use cases inaccessible to prior models. We envision that ProtNote will enhance protein-function discovery by enabling scientists to use free text inputs without restriction to predefined labels-a necessary capability for navigating the dynamic landscape of protein biology.</p><p><strong>Availability and implementation: </strong>The code is available on GitHub: https://github.com/microsoft/protnote; model weights, datasets, and evaluation metrics are provided via Zenodo: https://zenodo.org/records/13897920.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":"41 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12054973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144026634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pitfalls of bacterial pan-genome analysis approaches: a case study of Mycobacterium tuberculosis and two less clonal bacterial species. 细菌泛基因组分析方法的陷阱:结核分枝杆菌和两个较少克隆的细菌物种的案例研究。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf219
Maximillian G Marin, Natalia Quinones-Olvera, Christoph Wippel, Mahboobeh Behruznia, Brendan M Jeffrey, Michael Harris, Brendon C Mann, Alex Rosenthal, Karen R Jacobson, Robin M Warren, Heng Li, Conor J Meehan, Maha R Farhat
{"title":"Pitfalls of bacterial pan-genome analysis approaches: a case study of Mycobacterium tuberculosis and two less clonal bacterial species.","authors":"Maximillian G Marin, Natalia Quinones-Olvera, Christoph Wippel, Mahboobeh Behruznia, Brendan M Jeffrey, Michael Harris, Brendon C Mann, Alex Rosenthal, Karen R Jacobson, Robin M Warren, Heng Li, Conor J Meehan, Maha R Farhat","doi":"10.1093/bioinformatics/btaf219","DOIUrl":"10.1093/bioinformatics/btaf219","url":null,"abstract":"<p><strong>Summary: </strong>Pan-genome analysis is a fundamental tool for studying bacterial genome evolution; however, the variety in methods used to define and measure the pan-genome poses challenges to the interpretation and reliability of results. Using Mycobacterium tuberculosis, a clonally evolving bacterium with a small accessory genome, as a model system, we systematically evaluated sources of variability in pan-genome estimates. Our analysis revealed that differences in assembly type (short-read versus hybrid), annotation pipeline, and pan-genome software, significantly impact predictions of core and accessory genome size. Extending our analysis to two additional bacterial species, Escherichia coli and Staphylococcus aureus, we observed consistent tool-dependent biases but species-specific patterns in pan-genome variability. Our findings highlight the importance of integrating nucleotide- and protein-level analyses to improve the reliability and reproducibility of pan-genome studies across diverse bacterial populations.</p><p><strong>Availability and implementation: </strong>Panqc is freely available under an MIT license at https://github.com/maxgmarin/panqc.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12119186/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144026501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking the methods for predicting base pairs in RNA-RNA interactions. RNA-RNA相互作用中预测碱基对的基准方法。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf289
Mei Lang, Thomas Litfin, Ke Chen, Jian Zhan, Yaoqi Zhou
{"title":"Benchmarking the methods for predicting base pairs in RNA-RNA interactions.","authors":"Mei Lang, Thomas Litfin, Ke Chen, Jian Zhan, Yaoqi Zhou","doi":"10.1093/bioinformatics/btaf289","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf289","url":null,"abstract":"<p><strong>Motivation: </strong>The intricate network of RNA-RNA interactions, crucial for orchestrating essential cellular processes like transcriptional and translational regulations, has been unveiling through high-throughput techniques and computational predictions. As experimental determination of RNA-RNA interactions at the base-pair resolution remains challenging, a timely update for assessing complementary computational tools is necessary, particularly given the recent emergence of deep learning-based methods.</p><p><strong>Results: </strong>Here, we employed base pairs derived from three-dimensional RNA complex structures as a gold standard benchmark to assess the performance of 23 different methods ranging from alignment-based methods, free-energy-based minimization to deep-learning techniques. The result indicates that a deep-learning-based method, SPOT-RNA, can be generalized to make accurate zero-shot predictions of RNA-RNA interactions not only between previously unseen RNA structures but also between RNAs without monomeric structures. The finding underscores the potential of deep learning as a robust tool for advancing our understanding of these complex molecular interactions.</p><p><strong>Availability: </strong>All data and codes are available at https://github.com/meilanglang/RNA-RNA-Interaction.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algebraic differentiation for fast sensitivity analysis of optimal flux modes in metabolic models. 代谢模型中最优通量模式快速敏感性分析的代数微分。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf287
Hester Chapman, Miroslav Kratochvíl, Oliver Ebenhöh, St Elmo Wilken
{"title":"Algebraic differentiation for fast sensitivity analysis of optimal flux modes in metabolic models.","authors":"Hester Chapman, Miroslav Kratochvíl, Oliver Ebenhöh, St Elmo Wilken","doi":"10.1093/bioinformatics/btaf287","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf287","url":null,"abstract":"<p><strong>Motivation: </strong>Sensitivity analysis is a useful tool to identify key parameters in metabolic models. It is typically only applied to the growth rate, disregarding the sensitivity of other solution variables to parameters. Further, sensitivity analysis of elementary flux modes could provide low-dimensional insights into optimal solutions, but they are not defined when a model is subject to inhomogeneous flux constraints, such as the frequently used ATP maintenance reaction.</p><p><strong>Results: </strong>We introduce optimal flux modes (OFMs), an analogue to EFMs, but specifically applied to optimal solutions of constraint-based models. Further, we prove that implicit differentiation can always be used to efficiently calculate the sensitivities of both whole-model solutions and OFM-based solutions to model parameters. This allows for fine-grained sensitivity analysis of the optimal solution, and investigation of how these parameters exert control on the optimal composition of OFMs. This novel framework is implemented in DifferentiableMetabolism.jl, a software package designed to efficiently differentiate solutions of constraint-based models. To demonstrate scalability, we differentiate solutions of 342 yeast models; additionally we show that sensitivities of specific subsystems can guide metabolic engineering. Applying our scheme to an Escherichia coli model, we find that OFM sensitivities predict the effect of knockout experiments on waste product accumulation. Sensitivity analysis of OFMs also provides key insights into metabolic changes resulting from parameter perturbations.</p><p><strong>Availability and implementation: </strong>Software introduced here is available as open-source Julia packages DifferentiableMetabolism.jl (https://github.com/stelmo/DifferentiableMetabolism.jl) and ElementaryFluxModes.jl (https://github.com/HettieC/ElementaryFluxModes.jl), which both work on all major operating systems and computer architectures. Code to reproduce all results is available from https://github.com/HettieC/DifferentiableOFMPaper, and as an archive from https://doi.org/10.5281/zenodo.15183208.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144060768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProBASS-a language model with sequence and structural features for predicting the effect of mutations on binding affinity. probass是一种具有序列和结构特征的语言模型,用于预测突变对结合亲和力的影响。
Bioinformatics (Oxford, England) Pub Date : 2025-05-06 DOI: 10.1093/bioinformatics/btaf270
Sagara N S Gurusinghe, Yibing Wu, William DeGrado, Julia M Shifman
{"title":"ProBASS-a language model with sequence and structural features for predicting the effect of mutations on binding affinity.","authors":"Sagara N S Gurusinghe, Yibing Wu, William DeGrado, Julia M Shifman","doi":"10.1093/bioinformatics/btaf270","DOIUrl":"10.1093/bioinformatics/btaf270","url":null,"abstract":"<p><strong>Motivation: </strong>Protein-protein interactions (PPIs) govern virtually all cellular processes, and a single mutation within a PPI can significantly impact protein functionality, potentially leading to diseases. While numerous approaches have emerged to predict changes in the free energy of binding due to mutations (ΔΔGbind), most lack precision. Recently, protein language models (PLMs) have shown powerful predictive capabilities by leveraging both sequence and structural data from protein complexes, yet they have not been optimized specifically for ΔΔGbind prediction.</p><p><strong>Results: </strong>We developed an approach, ProBASS (Protein Binding Affinity from Structure and Sequence), to predict the effects of mutations on ΔΔGbind using two most advanced PLMs, ESM2 and ESM-IF1, which incorporate sequence and structural features, respectively. We first generated embeddings for each PPI mutant from the two PLMs and then fine-tuned ProBASS by training on a large dataset of experimental ΔΔGbind values. When training and testing were done on the same PPI, ProBASS achieved correlations with experimental ΔΔGbind values of 0.83 ± 0.05 and 0.69 ± 0.04 for single and double mutations, respectively. Additionally, when evaluated on a dataset of 2,325 single mutations across 131 PPIs, ProBASS reached a correlation of 0.81 ± 0.02, substantially outperforming other PLMs in predictive accuracy. Our results demonstrate that refining pre-trained PLMs with extensive ΔΔGbind datasets across multiple PPIs is a successful approach for creating a precise and broadly applicable ΔΔGbind prediction model, facilitating future protein engineering and design studies. ProBASS's accuracy could be further improved through training as more experimental data becomes available.</p><p><strong>Availability and implementation: </strong>ProBASS is available at: https://colab.research.google.com/github/sagagugit/ProBASS/blob/main/ProBASS.ipynb.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144009602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信