PREDAC-FluB:基于蛋白语言模型嵌入的卷积神经网络预测季节性乙型流感病毒抗原簇。

IF 7.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Wenping Xie, Jingze Liu, Chuan Wang, Jiangyuan Wang, Wenjie Han, Yousong Peng, Xiangjun Du, Jing Meng, Kang Ning, Taijiao Jiang
{"title":"PREDAC-FluB:基于蛋白语言模型嵌入的卷积神经网络预测季节性乙型流感病毒抗原簇。","authors":"Wenping Xie, Jingze Liu, Chuan Wang, Jiangyuan Wang, Wenjie Han, Yousong Peng, Xiangjun Du, Jing Meng, Kang Ning, Taijiao Jiang","doi":"10.1093/bib/bbaf308","DOIUrl":null,"url":null,"abstract":"<p><p>Influenza poses a significant global public health threat, with vaccination being the most effective and economical preventive measure. However, these punctuated antigenic changes, particularly in HA, result in escape from the immunity that was induced by prior infection or vaccination. Accurately predicting antigenic variation and understanding the antigenic dynamics of influenza viruses are crucial for selecting appropriate vaccine strains, but no established methods exist for influenza B viruses. Therefore, we present PREDAC-FluB, a hybrid deep learning framework that integrates spatial feature extraction via CNN to model interactions in HA1 sequences, multimodal sequence representation combining ESM-2 embeddings with six physicochemical descriptors and continuous encoding (ESM2-7-features), and UMAP-guided clustering for antigenic cluster identification. Using data from 9036 B/Victoria-lineage and 4520 B/Yamagata-lineage influenza virus pair. PREDAC-FluB demonstrates superior performance over traditional machine learning methods in predicting antigenic variation in influenza viruses, successfully identifying major antigenic clusters. Specifically, PREDAC-FluB classified the B/Victoria lineage into nine antigenic clusters and the B/Yamagata lineage into three antigenic clusters. In five-fold cross-validation for B/Victoria viruses, PREDAC-FluB with ESM2-7-features encoding achieved AUROC values of 0.9961 on the validation set and 0.9856 on the independent test set. In retrospective testing for B/Victoria viruses, PREDAC-FluB achieved AUROC values ranging from 0.83 to 0.97, demonstrating high prediction accuracy and effectively capturing antigenic variation information. In conclusion, PREDAC-FluB is a robust tool for antigenic computation, capable of accurately predicting antigenic variation in influenza B viruses. Its high prediction accuracy makes it a promising auxiliary method for recommending future influenza vaccine strains.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12264208/pdf/","citationCount":"0","resultStr":"{\"title\":\"PREDAC-FluB: predicting antigenic clusters of seasonal influenza B viruses with protein language model embedding based convolutional neural network.\",\"authors\":\"Wenping Xie, Jingze Liu, Chuan Wang, Jiangyuan Wang, Wenjie Han, Yousong Peng, Xiangjun Du, Jing Meng, Kang Ning, Taijiao Jiang\",\"doi\":\"10.1093/bib/bbaf308\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Influenza poses a significant global public health threat, with vaccination being the most effective and economical preventive measure. However, these punctuated antigenic changes, particularly in HA, result in escape from the immunity that was induced by prior infection or vaccination. Accurately predicting antigenic variation and understanding the antigenic dynamics of influenza viruses are crucial for selecting appropriate vaccine strains, but no established methods exist for influenza B viruses. Therefore, we present PREDAC-FluB, a hybrid deep learning framework that integrates spatial feature extraction via CNN to model interactions in HA1 sequences, multimodal sequence representation combining ESM-2 embeddings with six physicochemical descriptors and continuous encoding (ESM2-7-features), and UMAP-guided clustering for antigenic cluster identification. Using data from 9036 B/Victoria-lineage and 4520 B/Yamagata-lineage influenza virus pair. PREDAC-FluB demonstrates superior performance over traditional machine learning methods in predicting antigenic variation in influenza viruses, successfully identifying major antigenic clusters. Specifically, PREDAC-FluB classified the B/Victoria lineage into nine antigenic clusters and the B/Yamagata lineage into three antigenic clusters. In five-fold cross-validation for B/Victoria viruses, PREDAC-FluB with ESM2-7-features encoding achieved AUROC values of 0.9961 on the validation set and 0.9856 on the independent test set. In retrospective testing for B/Victoria viruses, PREDAC-FluB achieved AUROC values ranging from 0.83 to 0.97, demonstrating high prediction accuracy and effectively capturing antigenic variation information. In conclusion, PREDAC-FluB is a robust tool for antigenic computation, capable of accurately predicting antigenic variation in influenza B viruses. Its high prediction accuracy makes it a promising auxiliary method for recommending future influenza vaccine strains.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 4\",\"pages\":\"\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12264208/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf308\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf308","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

流感对全球公共卫生构成重大威胁,疫苗接种是最有效和最经济的预防措施。然而,这些间断的抗原变化,特别是在血凝素中,导致逃避先前感染或疫苗接种引起的免疫。准确预测流感病毒的抗原变异和了解流感病毒的抗原动力学对于选择合适的疫苗株至关重要,但目前还没有针对乙型流感病毒的既定方法。因此,我们提出了PREDAC-FluB,这是一个混合深度学习框架,它集成了通过CNN进行空间特征提取来模拟HA1序列中的相互作用,将esm2 -2嵌入与六个物理化学描述符和连续编码(esm2 -7-feature)相结合的多模态序列表示,以及用于抗原聚类识别的umap引导聚类。使用9036 B/维多利亚谱系和4520 B/山形谱系流感病毒对的数据。PREDAC-FluB在预测流感病毒的抗原变异方面表现出优于传统机器学习方法的性能,成功识别了主要的抗原簇。具体来说,PREDAC-FluB将B/Victoria谱系分为9个抗原簇,B/Yamagata谱系分为3个抗原簇。在对B/维多利亚病毒的5次交叉验证中,采用esm2 -7特征编码的PREDAC-FluB在验证集上的AUROC值为0.9961,在独立测试集上的AUROC值为0.9856。在对B/维多利亚病毒的回顾性检测中,PREDAC-FluB的AUROC值在0.83 ~ 0.97之间,具有较高的预测精度和有效捕获抗原变异信息的能力。总之,PREDAC-FluB是一个强大的抗原计算工具,能够准确预测乙型流感病毒的抗原变异。它具有较高的预测精度,是推荐未来流感疫苗株的一种很有前途的辅助方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

PREDAC-FluB: predicting antigenic clusters of seasonal influenza B viruses with protein language model embedding based convolutional neural network.

PREDAC-FluB: predicting antigenic clusters of seasonal influenza B viruses with protein language model embedding based convolutional neural network.

PREDAC-FluB: predicting antigenic clusters of seasonal influenza B viruses with protein language model embedding based convolutional neural network.

PREDAC-FluB: predicting antigenic clusters of seasonal influenza B viruses with protein language model embedding based convolutional neural network.

Influenza poses a significant global public health threat, with vaccination being the most effective and economical preventive measure. However, these punctuated antigenic changes, particularly in HA, result in escape from the immunity that was induced by prior infection or vaccination. Accurately predicting antigenic variation and understanding the antigenic dynamics of influenza viruses are crucial for selecting appropriate vaccine strains, but no established methods exist for influenza B viruses. Therefore, we present PREDAC-FluB, a hybrid deep learning framework that integrates spatial feature extraction via CNN to model interactions in HA1 sequences, multimodal sequence representation combining ESM-2 embeddings with six physicochemical descriptors and continuous encoding (ESM2-7-features), and UMAP-guided clustering for antigenic cluster identification. Using data from 9036 B/Victoria-lineage and 4520 B/Yamagata-lineage influenza virus pair. PREDAC-FluB demonstrates superior performance over traditional machine learning methods in predicting antigenic variation in influenza viruses, successfully identifying major antigenic clusters. Specifically, PREDAC-FluB classified the B/Victoria lineage into nine antigenic clusters and the B/Yamagata lineage into three antigenic clusters. In five-fold cross-validation for B/Victoria viruses, PREDAC-FluB with ESM2-7-features encoding achieved AUROC values of 0.9961 on the validation set and 0.9856 on the independent test set. In retrospective testing for B/Victoria viruses, PREDAC-FluB achieved AUROC values ranging from 0.83 to 0.97, demonstrating high prediction accuracy and effectively capturing antigenic variation information. In conclusion, PREDAC-FluB is a robust tool for antigenic computation, capable of accurately predicting antigenic variation in influenza B viruses. Its high prediction accuracy makes it a promising auxiliary method for recommending future influenza vaccine strains.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信