{"title":"One-shot Evaluation of Protein Mutability and Epistasis Score Using Structure-Based Model ESM3","authors":"Ngai Hei Ernest Ho , I-Son Ng , Jo-Shu Chang","doi":"10.1016/j.jtice.2025.106413","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Existing mutation scoring methods, often designed for predicting genetic diseases in the highly conserved human genome, lack generalizability across diverse protein fold types. Conventional alignment-based mutation scoring protocols lack causal explanations for conservation patterns, such as intra-chain and inter-chain interactions, due to insufficient structural awareness. Here, we leveraged the generative capabilities of the structure-based transformer model ESM3 to construct a UniProtKB-scale database of amino acid mutation and epistasis scores.</div></div><div><h3>Methods</h3><div>Mutation scores (M-scores) were calculated as average differences between mutant and wild-type embeddings. In contrast, epistasis scores (E-scores) were derived from variance among mutant embeddings, mathematically representing mutational variability due to epistatic interactions. To facilitate score calculations using Euclidean distance, the original 1536-dimensional ESM3 embeddings were compared to 312 dimensions.</div></div><div><h3>Significant Findings</h3><div>Our approach successfully captured protein mutability patterns beyond peptide flexibility and phylogenetic analysis. Benchmarking against reported <em>in silico</em> protein evolution datasets showed a significant correlation with mega-scale Bayesian optimization experimental results. M-score and E-score profiles exhibited intra-family variations compared to root mean square fluctuation (RMSF) profiles, revealing evolutionary constraints beyond kinetic energy and family-level phylogeny insights. This divergence proved functionally meaningful: in low-RMSF regions, gap formation between homologs could be explained by strong regional mutability and epistasis, indicated by high M- and E-scores. Overall, M-scores and E-scores serve as interpretable, target-agnostic metrics for structurally informed, epistasis-aware mutational analysis, advancing protein function and evolution understanding.</div></div>","PeriodicalId":381,"journal":{"name":"Journal of the Taiwan Institute of Chemical Engineers","volume":"179 ","pages":"Article 106413"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Taiwan Institute of Chemical Engineers","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1876107025004638","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Existing mutation scoring methods, often designed for predicting genetic diseases in the highly conserved human genome, lack generalizability across diverse protein fold types. Conventional alignment-based mutation scoring protocols lack causal explanations for conservation patterns, such as intra-chain and inter-chain interactions, due to insufficient structural awareness. Here, we leveraged the generative capabilities of the structure-based transformer model ESM3 to construct a UniProtKB-scale database of amino acid mutation and epistasis scores.
Methods
Mutation scores (M-scores) were calculated as average differences between mutant and wild-type embeddings. In contrast, epistasis scores (E-scores) were derived from variance among mutant embeddings, mathematically representing mutational variability due to epistatic interactions. To facilitate score calculations using Euclidean distance, the original 1536-dimensional ESM3 embeddings were compared to 312 dimensions.
Significant Findings
Our approach successfully captured protein mutability patterns beyond peptide flexibility and phylogenetic analysis. Benchmarking against reported in silico protein evolution datasets showed a significant correlation with mega-scale Bayesian optimization experimental results. M-score and E-score profiles exhibited intra-family variations compared to root mean square fluctuation (RMSF) profiles, revealing evolutionary constraints beyond kinetic energy and family-level phylogeny insights. This divergence proved functionally meaningful: in low-RMSF regions, gap formation between homologs could be explained by strong regional mutability and epistasis, indicated by high M- and E-scores. Overall, M-scores and E-scores serve as interpretable, target-agnostic metrics for structurally informed, epistasis-aware mutational analysis, advancing protein function and evolution understanding.
期刊介绍:
Journal of the Taiwan Institute of Chemical Engineers (formerly known as Journal of the Chinese Institute of Chemical Engineers) publishes original works, from fundamental principles to practical applications, in the broad field of chemical engineering with special focus on three aspects: Chemical and Biomolecular Science and Technology, Energy and Environmental Science and Technology, and Materials Science and Technology. Authors should choose for their manuscript an appropriate aspect section and a few related classifications when submitting to the journal online.