Wenping Xie, Jingze Liu, Chuan Wang, Jiangyuan Wang, Wenjie Han, Yousong Peng, Xiangjun Du, Jing Meng, Kang Ning, Taijiao Jiang
{"title":"PREDAC-FluB:基于蛋白语言模型嵌入的卷积神经网络预测季节性乙型流感病毒抗原簇。","authors":"Wenping Xie, Jingze Liu, Chuan Wang, Jiangyuan Wang, Wenjie Han, Yousong Peng, Xiangjun Du, Jing Meng, Kang Ning, Taijiao Jiang","doi":"10.1093/bib/bbaf308","DOIUrl":null,"url":null,"abstract":"<p><p>Influenza poses a significant global public health threat, with vaccination being the most effective and economical preventive measure. However, these punctuated antigenic changes, particularly in HA, result in escape from the immunity that was induced by prior infection or vaccination. Accurately predicting antigenic variation and understanding the antigenic dynamics of influenza viruses are crucial for selecting appropriate vaccine strains, but no established methods exist for influenza B viruses. Therefore, we present PREDAC-FluB, a hybrid deep learning framework that integrates spatial feature extraction via CNN to model interactions in HA1 sequences, multimodal sequence representation combining ESM-2 embeddings with six physicochemical descriptors and continuous encoding (ESM2-7-features), and UMAP-guided clustering for antigenic cluster identification. Using data from 9036 B/Victoria-lineage and 4520 B/Yamagata-lineage influenza virus pair. PREDAC-FluB demonstrates superior performance over traditional machine learning methods in predicting antigenic variation in influenza viruses, successfully identifying major antigenic clusters. Specifically, PREDAC-FluB classified the B/Victoria lineage into nine antigenic clusters and the B/Yamagata lineage into three antigenic clusters. In five-fold cross-validation for B/Victoria viruses, PREDAC-FluB with ESM2-7-features encoding achieved AUROC values of 0.9961 on the validation set and 0.9856 on the independent test set. In retrospective testing for B/Victoria viruses, PREDAC-FluB achieved AUROC values ranging from 0.83 to 0.97, demonstrating high prediction accuracy and effectively capturing antigenic variation information. In conclusion, PREDAC-FluB is a robust tool for antigenic computation, capable of accurately predicting antigenic variation in influenza B viruses. Its high prediction accuracy makes it a promising auxiliary method for recommending future influenza vaccine strains.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12264208/pdf/","citationCount":"0","resultStr":"{\"title\":\"PREDAC-FluB: predicting antigenic clusters of seasonal influenza B viruses with protein language model embedding based convolutional neural network.\",\"authors\":\"Wenping Xie, Jingze Liu, Chuan Wang, Jiangyuan Wang, Wenjie Han, Yousong Peng, Xiangjun Du, Jing Meng, Kang Ning, Taijiao Jiang\",\"doi\":\"10.1093/bib/bbaf308\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Influenza poses a significant global public health threat, with vaccination being the most effective and economical preventive measure. However, these punctuated antigenic changes, particularly in HA, result in escape from the immunity that was induced by prior infection or vaccination. Accurately predicting antigenic variation and understanding the antigenic dynamics of influenza viruses are crucial for selecting appropriate vaccine strains, but no established methods exist for influenza B viruses. Therefore, we present PREDAC-FluB, a hybrid deep learning framework that integrates spatial feature extraction via CNN to model interactions in HA1 sequences, multimodal sequence representation combining ESM-2 embeddings with six physicochemical descriptors and continuous encoding (ESM2-7-features), and UMAP-guided clustering for antigenic cluster identification. Using data from 9036 B/Victoria-lineage and 4520 B/Yamagata-lineage influenza virus pair. PREDAC-FluB demonstrates superior performance over traditional machine learning methods in predicting antigenic variation in influenza viruses, successfully identifying major antigenic clusters. Specifically, PREDAC-FluB classified the B/Victoria lineage into nine antigenic clusters and the B/Yamagata lineage into three antigenic clusters. In five-fold cross-validation for B/Victoria viruses, PREDAC-FluB with ESM2-7-features encoding achieved AUROC values of 0.9961 on the validation set and 0.9856 on the independent test set. In retrospective testing for B/Victoria viruses, PREDAC-FluB achieved AUROC values ranging from 0.83 to 0.97, demonstrating high prediction accuracy and effectively capturing antigenic variation information. In conclusion, PREDAC-FluB is a robust tool for antigenic computation, capable of accurately predicting antigenic variation in influenza B viruses. Its high prediction accuracy makes it a promising auxiliary method for recommending future influenza vaccine strains.</p>\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 4\",\"pages\":\"\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12264208/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbaf308\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf308","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
PREDAC-FluB: predicting antigenic clusters of seasonal influenza B viruses with protein language model embedding based convolutional neural network.
Influenza poses a significant global public health threat, with vaccination being the most effective and economical preventive measure. However, these punctuated antigenic changes, particularly in HA, result in escape from the immunity that was induced by prior infection or vaccination. Accurately predicting antigenic variation and understanding the antigenic dynamics of influenza viruses are crucial for selecting appropriate vaccine strains, but no established methods exist for influenza B viruses. Therefore, we present PREDAC-FluB, a hybrid deep learning framework that integrates spatial feature extraction via CNN to model interactions in HA1 sequences, multimodal sequence representation combining ESM-2 embeddings with six physicochemical descriptors and continuous encoding (ESM2-7-features), and UMAP-guided clustering for antigenic cluster identification. Using data from 9036 B/Victoria-lineage and 4520 B/Yamagata-lineage influenza virus pair. PREDAC-FluB demonstrates superior performance over traditional machine learning methods in predicting antigenic variation in influenza viruses, successfully identifying major antigenic clusters. Specifically, PREDAC-FluB classified the B/Victoria lineage into nine antigenic clusters and the B/Yamagata lineage into three antigenic clusters. In five-fold cross-validation for B/Victoria viruses, PREDAC-FluB with ESM2-7-features encoding achieved AUROC values of 0.9961 on the validation set and 0.9856 on the independent test set. In retrospective testing for B/Victoria viruses, PREDAC-FluB achieved AUROC values ranging from 0.83 to 0.97, demonstrating high prediction accuracy and effectively capturing antigenic variation information. In conclusion, PREDAC-FluB is a robust tool for antigenic computation, capable of accurately predicting antigenic variation in influenza B viruses. Its high prediction accuracy makes it a promising auxiliary method for recommending future influenza vaccine strains.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.