A systematic evaluation of the language-of-viral-escape model using multiple machine learning frameworks

Brent Allman, Luiz Vieira, Daniel J Diaz, Claus O Wilke
{"title":"A systematic evaluation of the language-of-viral-escape model using multiple machine learning frameworks","authors":"Brent Allman, Luiz Vieira, Daniel J Diaz, Claus O Wilke","doi":"10.1101/2024.09.04.611278","DOIUrl":null,"url":null,"abstract":"Predicting the evolutionary patterns of emerging and endemic viruses is key for mitigating their spread in host populations. In particular, it is critical to rapidly identify mutations with the potential for immune escape or increased disease burden (variants of concern). Knowing which circulating mutations are such variants of concern can inform treatment or mitigation strategies such as alternative vaccines or targeted social distancing. A recent study proposed that variants of concern can be identified using two quantities extracted from protein language models, grammaticality and semantic change. These quantities are defined in analogy to concepts from natural language processing. Grammaticality is intended to be a measure of whether a variant viral protein is viable, and semantic change is intended to be a measure of potential for immune escape. Here, we systematically test this hypothesis, taking advantage of several high-throughput datasets that have become available, and also testing additional machine learning models for calculating the grammaticality metric. We find that grammaticality can be a measure of protein viability, though the more traditional metric ΔΔG appears to be more effective. By contrast, we do not find compelling evidence that semantic change is a useful tool for identifying immune escape mutations.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.04.611278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Predicting the evolutionary patterns of emerging and endemic viruses is key for mitigating their spread in host populations. In particular, it is critical to rapidly identify mutations with the potential for immune escape or increased disease burden (variants of concern). Knowing which circulating mutations are such variants of concern can inform treatment or mitigation strategies such as alternative vaccines or targeted social distancing. A recent study proposed that variants of concern can be identified using two quantities extracted from protein language models, grammaticality and semantic change. These quantities are defined in analogy to concepts from natural language processing. Grammaticality is intended to be a measure of whether a variant viral protein is viable, and semantic change is intended to be a measure of potential for immune escape. Here, we systematically test this hypothesis, taking advantage of several high-throughput datasets that have become available, and also testing additional machine learning models for calculating the grammaticality metric. We find that grammaticality can be a measure of protein viability, though the more traditional metric ΔΔG appears to be more effective. By contrast, we do not find compelling evidence that semantic change is a useful tool for identifying immune escape mutations.
使用多种机器学习框架对病毒逃逸语言模型进行系统评估
预测新出现病毒和地方流行病毒的进化模式是减少其在宿主群体中传播的关键。特别是,快速识别可能导致免疫逃逸或疾病负担加重的变异(令人担忧的变异)至关重要。了解哪些循环变异属于此类令人担忧的变异,可以为治疗或缓解策略(如替代疫苗或有针对性的社会疏远)提供依据。最近的一项研究提出,可以利用从蛋白质语言模型中提取的语法性和语义变化这两个量来识别令人担忧的变异。这些量的定义类似于自然语言处理中的概念。语法性意在衡量变异病毒蛋白是否有生命力,语义变化意在衡量免疫逃逸的可能性。在这里,我们系统地测试了这一假设,利用了几个可用的高通量数据集,还测试了计算语法度量的其他机器学习模型。我们发现语法性可以衡量蛋白质的活力,不过更传统的指标ΔΔG 似乎更有效。相比之下,我们没有发现令人信服的证据表明语义变化是识别免疫逃逸突变的有用工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信