WaveSeekerNet:使用基于注意力的深度学习准确预测甲型流感病毒亚型和宿主来源。

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Hoang-Hai Nguyen, Josip Rudar, Nathaniel Lesperance, Oksana Vernygora, Graham W Taylor, Chad Laing, David Lapen, Carson K Leung, Oliver Lung
{"title":"WaveSeekerNet:使用基于注意力的深度学习准确预测甲型流感病毒亚型和宿主来源。","authors":"Hoang-Hai Nguyen, Josip Rudar, Nathaniel Lesperance, Oksana Vernygora, Graham W Taylor, Chad Laing, David Lapen, Carson K Leung, Oliver Lung","doi":"10.1093/gigascience/giaf089","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Influenza A virus (IAV) poses a significant threat to animal health globally, with its ability to overcome species barriers and cause pandemics. Rapid and accurate IAV subtypes and host source prediction is crucial for effective surveillance and pandemic preparedness. Deep learning has emerged as a powerful tool for analyzing viral genomic sequences, offering new ways to uncover hidden patterns associated with viral characteristics and host adaptation.</p><p><strong>Findings: </strong>We introduce WaveSeekerNet, a novel deep learning model for accurate and rapid prediction of IAV subtypes and host source. The model leverages attention-based mechanisms and efficient token mixing schemes, including the Fourier Transform and the Wavelet Transform, to capture intricate patterns within viral RNA and protein sequences. Extensive experiments on diverse datasets demonstrate WaveSeekerNet's superior performance to existing models that use the traditional self-attention mechanism. Notably, WaveSeekerNet rivals VADR (Viral Annotation DefineR) in subtype prediction using the high-quality RNA sequences, achieving the maximum score of 1.0 on metrics, including the Balanced Accuracy, F1-score (Macro Average), and Matthews Correlation Coefficient. Our approach to subtype and host source prediction also exceeds the pretrained ESM-2 (Evolutionary Scale Modeling) models with respect to generalization performance and computational cost. Furthermore, WaveSeekerNet exhibits remarkable accuracy in distinguishing between human, avian, and other mammalian hosts. The ability of WaveSeekerNet to flag potential cross-species transmission events underscores its significant value for real-time surveillance and proactive pandemic preparedness efforts.</p><p><strong>Conclusions: </strong>WaveSeekerNet's superior performance, efficiency, and ability to flag potential cross-species transmission events highlight its potential for real-time surveillance and pandemic preparedness. This model represents a significant advancement in applying deep learning for IAV classification and holds promise for future epidemiological, veterinary studies, and public health interventions.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395966/pdf/","citationCount":"0","resultStr":"{\"title\":\"WaveSeekerNet: accurate prediction of influenza A virus subtypes and host source using attention-based deep learning.\",\"authors\":\"Hoang-Hai Nguyen, Josip Rudar, Nathaniel Lesperance, Oksana Vernygora, Graham W Taylor, Chad Laing, David Lapen, Carson K Leung, Oliver Lung\",\"doi\":\"10.1093/gigascience/giaf089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Influenza A virus (IAV) poses a significant threat to animal health globally, with its ability to overcome species barriers and cause pandemics. Rapid and accurate IAV subtypes and host source prediction is crucial for effective surveillance and pandemic preparedness. Deep learning has emerged as a powerful tool for analyzing viral genomic sequences, offering new ways to uncover hidden patterns associated with viral characteristics and host adaptation.</p><p><strong>Findings: </strong>We introduce WaveSeekerNet, a novel deep learning model for accurate and rapid prediction of IAV subtypes and host source. The model leverages attention-based mechanisms and efficient token mixing schemes, including the Fourier Transform and the Wavelet Transform, to capture intricate patterns within viral RNA and protein sequences. Extensive experiments on diverse datasets demonstrate WaveSeekerNet's superior performance to existing models that use the traditional self-attention mechanism. Notably, WaveSeekerNet rivals VADR (Viral Annotation DefineR) in subtype prediction using the high-quality RNA sequences, achieving the maximum score of 1.0 on metrics, including the Balanced Accuracy, F1-score (Macro Average), and Matthews Correlation Coefficient. Our approach to subtype and host source prediction also exceeds the pretrained ESM-2 (Evolutionary Scale Modeling) models with respect to generalization performance and computational cost. Furthermore, WaveSeekerNet exhibits remarkable accuracy in distinguishing between human, avian, and other mammalian hosts. The ability of WaveSeekerNet to flag potential cross-species transmission events underscores its significant value for real-time surveillance and proactive pandemic preparedness efforts.</p><p><strong>Conclusions: </strong>WaveSeekerNet's superior performance, efficiency, and ability to flag potential cross-species transmission events highlight its potential for real-time surveillance and pandemic preparedness. This model represents a significant advancement in applying deep learning for IAV classification and holds promise for future epidemiological, veterinary studies, and public health interventions.</p>\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":\"14 \",\"pages\":\"\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395966/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giaf089\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf089","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

背景:甲型流感病毒(IAV)具有克服物种障碍并引起大流行的能力,对全球动物健康构成重大威胁。快速准确地预测IAV亚型和宿主来源对于有效监测和大流行防范至关重要。深度学习已经成为分析病毒基因组序列的强大工具,为揭示与病毒特征和宿主适应相关的隐藏模式提供了新的方法。研究结果:我们引入了一种新的深度学习模型WaveSeekerNet,用于准确快速地预测IAV亚型和宿主来源。该模型利用基于注意力的机制和有效的标记混合方案,包括傅里叶变换和小波变换,来捕获病毒RNA和蛋白质序列中的复杂模式。在不同数据集上进行的大量实验表明,WaveSeekerNet比使用传统自关注机制的现有模型性能更好。值得注意的是,WaveSeekerNet在使用高质量RNA序列进行亚型预测方面与VADR(病毒注释定义器)竞争,在指标上达到了1.0的最高分数,包括平衡精度,f1分数(宏观平均)和马修斯相关系数。我们的亚型和宿主源预测方法在泛化性能和计算成本方面也超过了预训练的ESM-2(进化尺度模型)模型。此外,WaveSeekerNet在区分人类、鸟类和其他哺乳动物宿主方面表现出惊人的准确性。WaveSeekerNet标记潜在跨物种传播事件的能力强调了其在实时监测和主动大流行防范工作中的重要价值。结论:WaveSeekerNet卓越的性能、效率和标记潜在跨物种传播事件的能力突出了其在实时监测和大流行防范方面的潜力。该模型代表了将深度学习应用于IAV分类方面的重大进步,并为未来的流行病学、兽医研究和公共卫生干预带来了希望。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
WaveSeekerNet: accurate prediction of influenza A virus subtypes and host source using attention-based deep learning.

Background: Influenza A virus (IAV) poses a significant threat to animal health globally, with its ability to overcome species barriers and cause pandemics. Rapid and accurate IAV subtypes and host source prediction is crucial for effective surveillance and pandemic preparedness. Deep learning has emerged as a powerful tool for analyzing viral genomic sequences, offering new ways to uncover hidden patterns associated with viral characteristics and host adaptation.

Findings: We introduce WaveSeekerNet, a novel deep learning model for accurate and rapid prediction of IAV subtypes and host source. The model leverages attention-based mechanisms and efficient token mixing schemes, including the Fourier Transform and the Wavelet Transform, to capture intricate patterns within viral RNA and protein sequences. Extensive experiments on diverse datasets demonstrate WaveSeekerNet's superior performance to existing models that use the traditional self-attention mechanism. Notably, WaveSeekerNet rivals VADR (Viral Annotation DefineR) in subtype prediction using the high-quality RNA sequences, achieving the maximum score of 1.0 on metrics, including the Balanced Accuracy, F1-score (Macro Average), and Matthews Correlation Coefficient. Our approach to subtype and host source prediction also exceeds the pretrained ESM-2 (Evolutionary Scale Modeling) models with respect to generalization performance and computational cost. Furthermore, WaveSeekerNet exhibits remarkable accuracy in distinguishing between human, avian, and other mammalian hosts. The ability of WaveSeekerNet to flag potential cross-species transmission events underscores its significant value for real-time surveillance and proactive pandemic preparedness efforts.

Conclusions: WaveSeekerNet's superior performance, efficiency, and ability to flag potential cross-species transmission events highlight its potential for real-time surveillance and pandemic preparedness. This model represents a significant advancement in applying deep learning for IAV classification and holds promise for future epidemiological, veterinary studies, and public health interventions.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信