Deyvid Amgarten, Bruno Koshin Vázquez Iha, Carlos Morais Piroupo, Aline Maria da Silva, João Carlos Setubal
{"title":"基于注释基因组特征和神经网络的噬菌体宿主预测新工具vHULK","authors":"Deyvid Amgarten, Bruno Koshin Vázquez Iha, Carlos Morais Piroupo, Aline Maria da Silva, João Carlos Setubal","doi":"10.1089/phage.2021.0016","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The experimental determination of a bacteriophage host is a laborious procedure. Thus, there is a pressing need for reliable computational predictions of bacteriophage hosts.</p><p><strong>Materials and methods: </strong>We developed the program vHULK for phage host prediction based on 9504 phage genome features, which consider alignment significance scores between predicted proteins and a curated database of viral protein families. The features were fed to a neural network, and two models were trained to predict 77 host genera and 118 host species.</p><p><strong>Results: </strong>In controlled random test sets with 90% redundancy reduction in terms of protein similarity, vHULK obtained on average 83% precision and 79% recall at the genus level, and 71% precision and 67% recall at the species level. The performance of vHULK was compared against three other tools on a test data set with 2153 phage genomes. On this data set, vHULK achieved better performance at both the genus and the species levels than the other tools.</p><p><strong>Conclusions: </strong>Our results suggest that vHULK represents an advance on the state of art in phage host prediction.</p>","PeriodicalId":74428,"journal":{"name":"PHAGE (New Rochelle, N.Y.)","volume":"3 4","pages":"204-212"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9917316/pdf/","citationCount":"3","resultStr":"{\"title\":\"vHULK, a New Tool for Bacteriophage Host Prediction Based on Annotated Genomic Features and Neural Networks.\",\"authors\":\"Deyvid Amgarten, Bruno Koshin Vázquez Iha, Carlos Morais Piroupo, Aline Maria da Silva, João Carlos Setubal\",\"doi\":\"10.1089/phage.2021.0016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The experimental determination of a bacteriophage host is a laborious procedure. Thus, there is a pressing need for reliable computational predictions of bacteriophage hosts.</p><p><strong>Materials and methods: </strong>We developed the program vHULK for phage host prediction based on 9504 phage genome features, which consider alignment significance scores between predicted proteins and a curated database of viral protein families. The features were fed to a neural network, and two models were trained to predict 77 host genera and 118 host species.</p><p><strong>Results: </strong>In controlled random test sets with 90% redundancy reduction in terms of protein similarity, vHULK obtained on average 83% precision and 79% recall at the genus level, and 71% precision and 67% recall at the species level. The performance of vHULK was compared against three other tools on a test data set with 2153 phage genomes. On this data set, vHULK achieved better performance at both the genus and the species levels than the other tools.</p><p><strong>Conclusions: </strong>Our results suggest that vHULK represents an advance on the state of art in phage host prediction.</p>\",\"PeriodicalId\":74428,\"journal\":{\"name\":\"PHAGE (New Rochelle, N.Y.)\",\"volume\":\"3 4\",\"pages\":\"204-212\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9917316/pdf/\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PHAGE (New Rochelle, N.Y.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1089/phage.2021.0016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/12/19 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PHAGE (New Rochelle, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1089/phage.2021.0016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/12/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
vHULK, a New Tool for Bacteriophage Host Prediction Based on Annotated Genomic Features and Neural Networks.
Background: The experimental determination of a bacteriophage host is a laborious procedure. Thus, there is a pressing need for reliable computational predictions of bacteriophage hosts.
Materials and methods: We developed the program vHULK for phage host prediction based on 9504 phage genome features, which consider alignment significance scores between predicted proteins and a curated database of viral protein families. The features were fed to a neural network, and two models were trained to predict 77 host genera and 118 host species.
Results: In controlled random test sets with 90% redundancy reduction in terms of protein similarity, vHULK obtained on average 83% precision and 79% recall at the genus level, and 71% precision and 67% recall at the species level. The performance of vHULK was compared against three other tools on a test data set with 2153 phage genomes. On this data set, vHULK achieved better performance at both the genus and the species levels than the other tools.
Conclusions: Our results suggest that vHULK represents an advance on the state of art in phage host prediction.