Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders.

Simone Rancati, Giovanna Nicora, Mattia Prosperi, Riccardo Bellazzi, Marco Salemi, Simone Marini
{"title":"Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders.","authors":"Simone Rancati, Giovanna Nicora, Mattia Prosperi, Riccardo Bellazzi, Marco Salemi, Simone Marini","doi":"10.1101/2023.10.24.563721","DOIUrl":null,"url":null,"abstract":"<p><p>The coronavirus disease of 2019 (COVID-19) pandemic is characterized by sequential emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants, lineages, and sublineages, outcompeting previously circulating ones because of, among other factors, increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute more than 10% of all the viral sequences added to the GISAID database on a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of about 4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01% - 3%), with median lead times of 4-17 weeks, and predicts FDLs ~5 and ~25 times better than a baseline approach For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness, and may provide significant insights for the optimization of public health <i>pre-emptive</i> intervention strategies.</p>","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634784/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.10.24.563721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The coronavirus disease of 2019 (COVID-19) pandemic is characterized by sequential emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants, lineages, and sublineages, outcompeting previously circulating ones because of, among other factors, increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute more than 10% of all the viral sequences added to the GISAID database on a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of about 4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01% - 3%), with median lead times of 4-17 weeks, and predicts FDLs ~5 and ~25 times better than a baseline approach For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness, and may provide significant insights for the optimization of public health pre-emptive intervention strategies.

利用深度自动编码器异常检测预测SARS-CoV-2谱系的优势性。
COVID-19大流行表明,需要一个快速、有效的基于基因组的监测系统来预测新出现的SARS-CoV-2变体和谱系。利用公共卫生监测或综合序列数据库的传统分子流行病学方法能够表征感染波和遗传进化的进化史,但在预测病毒遗传改变的未来前景方面存在不足。为了弥补这一差距,我们引入了一种新的基于深度学习、自动编码器的SARS-CoV-2异常检测方法(DeepAutoCov)。对全球公共SARS-CoV-2 GISAID数据库进行培训并更新。DeepAutoCov识别未来优势谱系(fdl),定义为每周使用Spike (S)蛋白,每周添加至少25%的SARS-CoV-2基因组的谱系。我们的算法基于通过无监督方法进行异常检测,这是必要的,因为fdl只能是后验的(即,在它们成为主导之后)。我们开发了两种并发方法(线性无监督和后验监督)来评估DeepAutoCoV的性能。DeepAutoCoV使用刺突(S)蛋白识别FDL,在全球数据上的中位提前期为31周,比其他方法获得的阳性预测值高出约7倍,高出23%。此外,它还可以提前17个月预测与疫苗相关的fdl。最后,DeepAutoCoV不仅具有预测性,而且具有可解释性,因为它可以确定fdl中的特定突变,从而产生关于谱系毒性或传播性潜在增加的假设。通过将基因组监测与人工智能相结合,我们的工作标志着一个变革性的步骤,可能为优化公共卫生预防和干预策略提供有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信