DPCIPI: A pre-trained deep learning model for predicting cross-immunity between drifted strains of Influenza A/H3N2

Yiming Du , Zhuotian Li , Qian He , Thomas Wetere Tulu , Kei Hang Katie Chan , Lin Wang , Sen Pei , Zhanwei Du , Zhen Wang , Xiao-Ke Xu , Xiao Fan Liu
{"title":"DPCIPI: A pre-trained deep learning model for predicting cross-immunity between drifted strains of Influenza A/H3N2","authors":"Yiming Du ,&nbsp;Zhuotian Li ,&nbsp;Qian He ,&nbsp;Thomas Wetere Tulu ,&nbsp;Kei Hang Katie Chan ,&nbsp;Lin Wang ,&nbsp;Sen Pei ,&nbsp;Zhanwei Du ,&nbsp;Zhen Wang ,&nbsp;Xiao-Ke Xu ,&nbsp;Xiao Fan Liu","doi":"10.1016/j.jai.2025.03.004","DOIUrl":null,"url":null,"abstract":"<div><div>Predicting cross-immunity between viral strains is vital for public health surveillance and vaccine development. Traditional neural network methods, such as BiLSTM, could be ineffective due to the lack of lab data for model training and the overshadowing of crucial features within sequence concatenation. The current work proposes a less data-consuming model incorporating a pre-trained gene sequence model and a mutual information inference operator. Our methodology utilizes gene alignment and deduplication algorithms to preprocess gene sequences, enhancing the model’s capacity to discern and focus on distinctions among input gene pairs. The model, i.e., DNA Pretrained Cross-Immunity Protection Inference model (DPCIPI), outperforms state-of-the-art (SOTA) models in predicting hemagglutination inhibition titer from influenza viral gene sequences only. Improvement in binary cross-immunity prediction is 1.58% in F1, 2.34% in precision, 1.57% in recall, and 1.57% in Accuracy. For multilevel cross-immunity improvements, the improvement is 2.12% in F1, 3.50% in precision, 2.19% in recall, and 2.19% in Accuracy. Our study showcases the potential of pre-trained gene models to improve predictions of antigenic variation and cross-immunity. With expanding gene data and advancements in pre-trained models, this approach promises significant impacts on vaccine development and public health.</div></div>","PeriodicalId":100755,"journal":{"name":"Journal of Automation and Intelligence","volume":"4 2","pages":"Pages 115-124"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Automation and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S294985542500019X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Predicting cross-immunity between viral strains is vital for public health surveillance and vaccine development. Traditional neural network methods, such as BiLSTM, could be ineffective due to the lack of lab data for model training and the overshadowing of crucial features within sequence concatenation. The current work proposes a less data-consuming model incorporating a pre-trained gene sequence model and a mutual information inference operator. Our methodology utilizes gene alignment and deduplication algorithms to preprocess gene sequences, enhancing the model’s capacity to discern and focus on distinctions among input gene pairs. The model, i.e., DNA Pretrained Cross-Immunity Protection Inference model (DPCIPI), outperforms state-of-the-art (SOTA) models in predicting hemagglutination inhibition titer from influenza viral gene sequences only. Improvement in binary cross-immunity prediction is 1.58% in F1, 2.34% in precision, 1.57% in recall, and 1.57% in Accuracy. For multilevel cross-immunity improvements, the improvement is 2.12% in F1, 3.50% in precision, 2.19% in recall, and 2.19% in Accuracy. Our study showcases the potential of pre-trained gene models to improve predictions of antigenic variation and cross-immunity. With expanding gene data and advancements in pre-trained models, this approach promises significant impacts on vaccine development and public health.
DPCIPI:一种预训练的深度学习模型,用于预测甲型流感/H3N2漂移株之间的交叉免疫
预测病毒株之间的交叉免疫对公共卫生监测和疫苗开发至关重要。传统的神经网络方法,如BiLSTM,由于缺乏用于模型训练的实验室数据以及序列串联中关键特征的掩盖,可能是无效的。目前的工作提出了一种结合预训练基因序列模型和互信息推理算子的数据消耗较少的模型。我们的方法利用基因比对和重复数据删除算法来预处理基因序列,增强模型辨别和关注输入基因对之间差异的能力。该模型,即DNA预训练交叉免疫保护推断模型(DPCIPI),在仅从流感病毒基因序列预测血凝抑制滴度方面优于最先进的(SOTA)模型。二值交叉免疫预测F1提高1.58%,precision提高2.34%,recall提高1.57%,Accuracy提高1.57%。对于多级交叉免疫改进,F1提高2.12%,precision提高3.50%,recall提高2.19%,Accuracy提高2.19%。我们的研究展示了预训练基因模型在改善抗原变异和交叉免疫预测方面的潜力。随着基因数据的扩大和预训练模型的进步,这种方法有望对疫苗开发和公共卫生产生重大影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信