Predicting high-fitness viral protein variants with Bayesian active learning and biophysics

IF 9.1 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Marian Huot, Dianzhuo Wang, Jiacheng Liu, Eugene I. Shakhnovich
{"title":"Predicting high-fitness viral protein variants with Bayesian active learning and biophysics","authors":"Marian Huot, Dianzhuo Wang, Jiacheng Liu, Eugene I. Shakhnovich","doi":"10.1073/pnas.2503742122","DOIUrl":null,"url":null,"abstract":"The early detection of high-fitness viral variants is critical for pandemic response, yet limited experimental resources at the onset of variant emergence hinder effective identification. To address this, we introduce an active learning framework, VIRAL (Viral Identification via Rapid Active Learning), that integrates protein language model, Gaussian process with uncertainty estimation, and a biophysical model to predict the fitness of novel variants in a few-shot learning setting. By benchmarking on past SARS-CoV-2 data, we demonstrate that our method accelerates the identification of high-fitness variants by up to fivefold compared to random sampling while requiring experimental characterization of fewer than 1% of possible variants. We also demonstrate that our framework effectively identifies sites that are frequently mutated during natural viral evolution with a predictive advantage of up to two years compared to baseline strategies, particularly those enabling antibody escape while preserving ACE2 binding. Through systematic analysis of different acquisition strategies, we show that incorporating uncertainty in variant selection enables broader exploration of the sequence landscape, leading to the identification of evolutionarily distant but potentially dangerous variants. Our results suggest that VIRAL could serve as an effective early warning system for identifying concerning SARS-CoV-2 variants and potentially emerging viruses with pandemic potential before they achieve widespread circulation.","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"5 1","pages":""},"PeriodicalIF":9.1000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2503742122","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The early detection of high-fitness viral variants is critical for pandemic response, yet limited experimental resources at the onset of variant emergence hinder effective identification. To address this, we introduce an active learning framework, VIRAL (Viral Identification via Rapid Active Learning), that integrates protein language model, Gaussian process with uncertainty estimation, and a biophysical model to predict the fitness of novel variants in a few-shot learning setting. By benchmarking on past SARS-CoV-2 data, we demonstrate that our method accelerates the identification of high-fitness variants by up to fivefold compared to random sampling while requiring experimental characterization of fewer than 1% of possible variants. We also demonstrate that our framework effectively identifies sites that are frequently mutated during natural viral evolution with a predictive advantage of up to two years compared to baseline strategies, particularly those enabling antibody escape while preserving ACE2 binding. Through systematic analysis of different acquisition strategies, we show that incorporating uncertainty in variant selection enables broader exploration of the sequence landscape, leading to the identification of evolutionarily distant but potentially dangerous variants. Our results suggest that VIRAL could serve as an effective early warning system for identifying concerning SARS-CoV-2 variants and potentially emerging viruses with pandemic potential before they achieve widespread circulation.
用贝叶斯主动学习和生物物理学预测高适应度病毒蛋白变异
早期发现高适应度病毒变异对于大流行应对至关重要,但变异出现时有限的实验资源阻碍了有效识别。为了解决这个问题,我们引入了一个主动学习框架,VIRAL (VIRAL Identification via Rapid active learning),它集成了蛋白质语言模型、带不确定性估计的高斯过程和生物物理模型,以在少量学习设置中预测新变体的适应度。通过对过去的SARS-CoV-2数据进行基准测试,我们证明,与随机抽样相比,我们的方法将高适应度变异的识别速度提高了5倍,同时需要对不到1%的可能变异进行实验表征。我们还证明,与基线策略相比,我们的框架有效地识别了在自然病毒进化过程中经常发生突变的位点,预测优势长达两年,特别是那些能够在保留ACE2结合的同时使抗体逃逸的位点。通过对不同获取策略的系统分析,我们发现在变异选择中加入不确定性可以更广泛地探索序列景观,从而识别进化上遥远但潜在危险的变异。我们的研究结果表明,VIRAL可以作为一个有效的预警系统,在SARS-CoV-2变体和潜在的具有大流行潜力的新出现病毒实现广泛传播之前识别它们。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.00
自引率
0.90%
发文量
3575
审稿时长
2.5 months
期刊介绍: The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信