Famke Alberts , Olaf Berke , Grazieli Maboni , Tatiana Petukhova , Zvonimir Poljak
{"title":"利用机器学习和血凝素序列确定 H3Nx 流感病毒的可能宿主","authors":"Famke Alberts , Olaf Berke , Grazieli Maboni , Tatiana Petukhova , Zvonimir Poljak","doi":"10.1016/j.prevetmed.2024.106351","DOIUrl":null,"url":null,"abstract":"<div><div>Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.</div></div>","PeriodicalId":20413,"journal":{"name":"Preventive veterinary medicine","volume":"233 ","pages":"Article 106351"},"PeriodicalIF":2.2000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Utilizing machine learning and hemagglutinin sequences to identify likely hosts of influenza H3Nx viruses\",\"authors\":\"Famke Alberts , Olaf Berke , Grazieli Maboni , Tatiana Petukhova , Zvonimir Poljak\",\"doi\":\"10.1016/j.prevetmed.2024.106351\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.</div></div>\",\"PeriodicalId\":20413,\"journal\":{\"name\":\"Preventive veterinary medicine\",\"volume\":\"233 \",\"pages\":\"Article 106351\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Preventive veterinary medicine\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S016758772400237X\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"VETERINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Preventive veterinary medicine","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016758772400237X","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"VETERINARY SCIENCES","Score":null,"Total":0}
Utilizing machine learning and hemagglutinin sequences to identify likely hosts of influenza H3Nx viruses
Influenza is a disease that represents both a public health and agricultural risk with pandemic potential. Among the subtypes of influenza A virus, H3 influenza virus can infect many avian and mammalian species and is therefore a virus of interest to human and veterinary public health. The primary goal of this study was to train and validate classifiers for the identification of the most likely host species using the hemagglutinin gene segment of H3 viruses. A five-step process was implemented, which included training four machine learning classifiers, testing the classifiers on the validation dataset, and further exploration of the best-performing model on three additional datasets. The gradient boosting machine classifier showed the highest host-classification accuracy with a 98.0 % (95 % CI [97.01, 98.73]) correct classification rate on an independent validation dataset. The classifications were further analyzed using the predicted probability score which highlighted sequences of particular interest. These sequences were both correctly and incorrectly classified sequences that showed considerable predicted probability for multiple hosts. This showed the potential of using these classifiers for rapid sequence classification and highlighting sequences of interest. Additionally, the classifiers were tested on a separate swine dataset composed of H3N2 sequences from 1998 to 2003 from the United States of America, and a separate canine dataset composed of canine H3N2 sequences of avian origin. These two datasets were utilized to look at the applications of predicted probability and host convergence over time. Lastly, the classifiers were used on an independent dataset of environmental sequences to explore the host identification of environmental sequences. The results of these classifiers show the potential for machine learning to be used as a host identification technique for viruses of unknown origin on a species-specific level.
期刊介绍:
Preventive Veterinary Medicine is one of the leading international resources for scientific reports on animal health programs and preventive veterinary medicine. The journal follows the guidelines for standardizing and strengthening the reporting of biomedical research which are available from the CONSORT, MOOSE, PRISMA, REFLECT, STARD, and STROBE statements. The journal focuses on:
Epidemiology of health events relevant to domestic and wild animals;
Economic impacts of epidemic and endemic animal and zoonotic diseases;
Latest methods and approaches in veterinary epidemiology;
Disease and infection control or eradication measures;
The "One Health" concept and the relationships between veterinary medicine, human health, animal-production systems, and the environment;
Development of new techniques in surveillance systems and diagnosis;
Evaluation and control of diseases in animal populations.