利用机器学习和血凝素序列确定 H3Nx 流感病毒的可能宿主

IF 2.2 2区 农林科学 Q1 VETERINARY SCIENCES
Famke Alberts , Olaf Berke , Grazieli Maboni , Tatiana Petukhova , Zvonimir Poljak
{"title":"利用机器学习和血凝素序列确定 H3Nx 流感病毒的可能宿主","authors":"Famke Alberts ,&nbsp;Olaf Berke ,&nbsp;Grazieli Maboni ,&nbsp;Tatiana Petukhova ,&nbsp;Zvonimir Poljak","doi":"10.1016/j.prevetmed.2024.106351","DOIUrl":null,"url":null,"abstract":"<div><div>Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.</div></div>","PeriodicalId":20413,"journal":{"name":"Preventive veterinary medicine","volume":"233 ","pages":"Article 106351"},"PeriodicalIF":2.2000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Utilizing machine learning and hemagglutinin sequences to identify likely hosts of influenza H3Nx viruses\",\"authors\":\"Famke Alberts ,&nbsp;Olaf Berke ,&nbsp;Grazieli Maboni ,&nbsp;Tatiana Petukhova ,&nbsp;Zvonimir Poljak\",\"doi\":\"10.1016/j.prevetmed.2024.106351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.</div></div>\",\"PeriodicalId\":20413,\"journal\":{\"name\":\"Preventive veterinary medicine\",\"volume\":\"233 \",\"pages\":\"Article 106351\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Preventive veterinary medicine\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S016758772400237X\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"VETERINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Preventive veterinary medicine","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016758772400237X","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"VETERINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

流感是一种具有大流行潜力的疾病,对公共卫生和农业都构成风险。在甲型流感病毒的亚型中,H3 流感病毒可感染多种禽类和哺乳动物,因此是人类和兽医公共卫生领域关注的病毒。本研究的主要目标是利用 H3 病毒的血凝素基因片段训练和验证分类器,以识别最可能的宿主物种。研究分五个步骤进行,包括训练四个机器学习分类器,在验证数据集上测试分类器,以及在另外三个数据集上进一步探索表现最佳的模型。梯度提升机器分类器的主机分类准确率最高,在独立验证数据集上的正确分类率为 98.0 % (95 % CI [97.01, 98.73])。使用预测概率分数对分类进行了进一步分析,该分数突出显示了特别感兴趣的序列。这些序列既有被正确分类的序列,也有被错误分类的序列,它们对多个宿主显示出相当高的预测概率。这显示了使用这些分类器进行快速序列分类和突出显示感兴趣序列的潜力。此外,分类器还在一个单独的猪数据集和一个单独的犬数据集上进行了测试,前者由来自美国的 1998 年至 2003 年的 H3N2 序列组成,后者由来自禽类的犬 H3N2 序列组成。利用这两个数据集来研究预测概率的应用和宿主随时间的趋同性。最后,分类器被用于一个独立的环境序列数据集,以探索环境序列的宿主识别。这些分类器的结果表明,机器学习有可能在物种特异性水平上用作未知来源病毒的宿主识别技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Utilizing machine learning and hemagglutinin sequences to identify likely hosts of influenza H3Nx viruses
Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Preventive veterinary medicine
Preventive veterinary medicine 农林科学-兽医学
CiteScore
5.60
自引率
7.70%
发文量
184
审稿时长
3 months
期刊介绍: Preventive Veterinary Medicine is one of the leading international resources for scientific reports on animal health programs and preventive veterinary medicine. The journal follows the guidelines for standardizing and strengthening the reporting of biomedical research which are available from the CONSORT, MOOSE, PRISMA, REFLECT, STARD, and STROBE statements. The journal focuses on: Epidemiology of health events relevant to domestic and wild animals; Economic impacts of epidemic and endemic animal and zoonotic diseases; Latest methods and approaches in veterinary epidemiology; Disease and infection control or eradication measures; The "One Health" concept and the relationships between veterinary medicine, human health, animal-production systems, and the environment; Development of new techniques in surveillance systems and diagnosis; Evaluation and control of diseases in animal populations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信