One-shot Evaluation of Protein Mutability and Epistasis Score Using Structure-Based Model ESM3

IF 6.3 3区 工程技术 Q1 ENGINEERING, CHEMICAL
Ngai Hei Ernest Ho , I-Son Ng , Jo-Shu Chang
{"title":"One-shot Evaluation of Protein Mutability and Epistasis Score Using Structure-Based Model ESM3","authors":"Ngai Hei Ernest Ho ,&nbsp;I-Son Ng ,&nbsp;Jo-Shu Chang","doi":"10.1016/j.jtice.2025.106413","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Existing mutation scoring methods, often designed for predicting genetic diseases in the highly conserved human genome, lack generalizability across diverse protein fold types. Conventional alignment-based mutation scoring protocols lack causal explanations for conservation patterns, such as intra-chain and inter-chain interactions, due to insufficient structural awareness. Here, we leveraged the generative capabilities of the structure-based transformer model ESM3 to construct a UniProtKB-scale database of amino acid mutation and epistasis scores.</div></div><div><h3>Methods</h3><div>Mutation scores (M-scores) were calculated as average differences between mutant and wild-type embeddings. In contrast, epistasis scores (E-scores) were derived from variance among mutant embeddings, mathematically representing mutational variability due to epistatic interactions. To facilitate score calculations using Euclidean distance, the original 1536-dimensional ESM3 embeddings were compared to 312 dimensions.</div></div><div><h3>Significant Findings</h3><div>Our approach successfully captured protein mutability patterns beyond peptide flexibility and phylogenetic analysis. Benchmarking against reported <em>in silico</em> protein evolution datasets showed a significant correlation with mega-scale Bayesian optimization experimental results. M-score and E-score profiles exhibited intra-family variations compared to root mean square fluctuation (RMSF) profiles, revealing evolutionary constraints beyond kinetic energy and family-level phylogeny insights. This divergence proved functionally meaningful: in low-RMSF regions, gap formation between homologs could be explained by strong regional mutability and epistasis, indicated by high M- and E-scores. Overall, M-scores and E-scores serve as interpretable, target-agnostic metrics for structurally informed, epistasis-aware mutational analysis, advancing protein function and evolution understanding.</div></div>","PeriodicalId":381,"journal":{"name":"Journal of the Taiwan Institute of Chemical Engineers","volume":"179 ","pages":"Article 106413"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Taiwan Institute of Chemical Engineers","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1876107025004638","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Existing mutation scoring methods, often designed for predicting genetic diseases in the highly conserved human genome, lack generalizability across diverse protein fold types. Conventional alignment-based mutation scoring protocols lack causal explanations for conservation patterns, such as intra-chain and inter-chain interactions, due to insufficient structural awareness. Here, we leveraged the generative capabilities of the structure-based transformer model ESM3 to construct a UniProtKB-scale database of amino acid mutation and epistasis scores.

Methods

Mutation scores (M-scores) were calculated as average differences between mutant and wild-type embeddings. In contrast, epistasis scores (E-scores) were derived from variance among mutant embeddings, mathematically representing mutational variability due to epistatic interactions. To facilitate score calculations using Euclidean distance, the original 1536-dimensional ESM3 embeddings were compared to 312 dimensions.

Significant Findings

Our approach successfully captured protein mutability patterns beyond peptide flexibility and phylogenetic analysis. Benchmarking against reported in silico protein evolution datasets showed a significant correlation with mega-scale Bayesian optimization experimental results. M-score and E-score profiles exhibited intra-family variations compared to root mean square fluctuation (RMSF) profiles, revealing evolutionary constraints beyond kinetic energy and family-level phylogeny insights. This divergence proved functionally meaningful: in low-RMSF regions, gap formation between homologs could be explained by strong regional mutability and epistasis, indicated by high M- and E-scores. Overall, M-scores and E-scores serve as interpretable, target-agnostic metrics for structurally informed, epistasis-aware mutational analysis, advancing protein function and evolution understanding.
基于结构模型ESM3的蛋白质突变性和上位性评分一次性评价
现有的突变评分方法通常用于预测高度保守的人类基因组中的遗传疾病,缺乏跨不同蛋白质折叠类型的通用性。由于缺乏结构意识,传统的基于比对的突变评分协议缺乏对保守模式(如链内和链间相互作用)的因果解释。在这里,我们利用基于结构的变压器模型ESM3的生成能力构建了一个uniprotkb规模的氨基酸突变和占位评分数据库。方法计算突变分数(M-scores)作为突变型和野生型嵌入的平均差异。相比之下,上位性得分(E-scores)是从突变嵌入的方差中得出的,数学上表示由于上位性相互作用而产生的突变变异性。为了便于使用欧几里得距离计算分数,将原始的1536维ESM3嵌入与312维进行了比较。sour方法成功捕获了蛋白质的突变模式,超出了肽的灵活性和系统发育分析。对已报道的硅蛋白进化数据集进行基准测试显示,与大规模贝叶斯优化实验结果显著相关。与均方根波动(RMSF)曲线相比,M-score和E-score曲线表现出家族内的差异,揭示了动能和家族级系统发育之外的进化约束。这种差异在功能上是有意义的:在低rmsf区域,同源物之间的差距形成可以用强的区域变异性和上位性来解释,这可以通过高M-和e -分数来说明。总体而言,m -分数和e -分数可作为结构信息、上位感知突变分析、推进蛋白质功能和进化理解的可解释的、目标不可知的指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
9.10
自引率
14.00%
发文量
362
审稿时长
35 days
期刊介绍: Journal of the Taiwan Institute of Chemical Engineers (formerly known as Journal of the Chinese Institute of Chemical Engineers) publishes original works, from fundamental principles to practical applications, in the broad field of chemical engineering with special focus on three aspects: Chemical and Biomolecular Science and Technology, Energy and Environmental Science and Technology, and Materials Science and Technology. Authors should choose for their manuscript an appropriate aspect section and a few related classifications when submitting to the journal online.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信