RNAVirHost:一种通过病毒基因组预测 RNA 病毒宿主的基于机器学习的方法。

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Guowei Chen, Jingzhe Jiang, Yanni Sun
{"title":"RNAVirHost:一种通过病毒基因组预测 RNA 病毒宿主的基于机器学习的方法。","authors":"Guowei Chen, Jingzhe Jiang, Yanni Sun","doi":"10.1093/gigascience/giae059","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The high-throughput sequencing technologies have revolutionized the identification of novel RNA viruses. Given that viruses are infectious agents, identifying hosts of these new viruses carries significant implications for public health and provides valuable insights into the dynamics of the microbiome. However, determining the hosts of these newly discovered viruses is not always straightforward, especially in the case of viruses detected in environmental samples. Even for host-associated samples, it is not always correct to assign the sample origin as the host of the identified viruses. The process of assigning hosts to RNA viruses remains challenging due to their high mutation rates and vast diversity.</p><p><strong>Results: </strong>In this study, we introduce RNAVirHost, a machine learning-based tool that predicts the hosts of RNA viruses solely based on viral genomes. RNAVirHost is a hierarchical classification framework that predicts hosts at different taxonomic levels. We demonstrate the superior accuracy of RNAVirHost in predicting hosts of RNA viruses through comprehensive comparisons with various state-of-the-art techniques. When applying to viruses from novel genera, RNAVirHost achieved the highest accuracy of 84.3%, outperforming the alignment-based strategy by 12.1%.</p><p><strong>Conclusions: </strong>The application of machine learning models has proven beneficial in predicting hosts of RNA viruses. By integrating genomic traits and sequence homologies, RNAVirHost provides a cost-effective and efficient strategy for host prediction. We believe that RNAVirHost can greatly assist in RNA virus analyses and contribute to pandemic surveillance.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11340644/pdf/","citationCount":"0","resultStr":"{\"title\":\"RNAVirHost: a machine learning-based method for predicting hosts of RNA viruses through viral genomes.\",\"authors\":\"Guowei Chen, Jingzhe Jiang, Yanni Sun\",\"doi\":\"10.1093/gigascience/giae059\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>The high-throughput sequencing technologies have revolutionized the identification of novel RNA viruses. Given that viruses are infectious agents, identifying hosts of these new viruses carries significant implications for public health and provides valuable insights into the dynamics of the microbiome. However, determining the hosts of these newly discovered viruses is not always straightforward, especially in the case of viruses detected in environmental samples. Even for host-associated samples, it is not always correct to assign the sample origin as the host of the identified viruses. The process of assigning hosts to RNA viruses remains challenging due to their high mutation rates and vast diversity.</p><p><strong>Results: </strong>In this study, we introduce RNAVirHost, a machine learning-based tool that predicts the hosts of RNA viruses solely based on viral genomes. RNAVirHost is a hierarchical classification framework that predicts hosts at different taxonomic levels. We demonstrate the superior accuracy of RNAVirHost in predicting hosts of RNA viruses through comprehensive comparisons with various state-of-the-art techniques. When applying to viruses from novel genera, RNAVirHost achieved the highest accuracy of 84.3%, outperforming the alignment-based strategy by 12.1%.</p><p><strong>Conclusions: </strong>The application of machine learning models has proven beneficial in predicting hosts of RNA viruses. By integrating genomic traits and sequence homologies, RNAVirHost provides a cost-effective and efficient strategy for host prediction. We believe that RNAVirHost can greatly assist in RNA virus analyses and contribute to pandemic surveillance.</p>\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2024-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11340644/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giae059\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae059","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

背景:高通量测序技术为鉴定新型 RNA 病毒带来了革命性的变化。鉴于病毒是传染性病原体,确定这些新病毒的宿主对公共卫生具有重大意义,并为了解微生物组的动态提供了宝贵的信息。然而,确定这些新发现病毒的宿主并不总是那么简单,尤其是在环境样本中检测到的病毒。即使是与宿主相关的样本,将样本来源定为已发现病毒的宿主也并非总是正确的。由于 RNA 病毒的高突变率和巨大的多样性,为 RNA 病毒确定宿主的过程仍然充满挑战:在本研究中,我们介绍了 RNAVirHost,这是一种基于机器学习的工具,它能仅根据病毒基因组预测 RNA 病毒的宿主。RNAVirHost 是一个分层分类框架,可预测不同分类级别的宿主。通过与各种最先进技术的综合比较,我们证明了 RNAVirHost 在预测 RNA 病毒宿主方面的卓越准确性。在应用于新属病毒时,RNAVirHost 的准确率最高,达到 84.3%,比基于比对的策略高出 12.1%:事实证明,机器学习模型的应用有利于预测 RNA 病毒的宿主。通过整合基因组特征和序列同源性,RNAVirHost 为宿主预测提供了一种经济高效的策略。我们相信,RNAVirHost 能极大地帮助 RNA 病毒分析,为大流行病监测做出贡献。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
RNAVirHost: a machine learning-based method for predicting hosts of RNA viruses through viral genomes.

Background: The high-throughput sequencing technologies have revolutionized the identification of novel RNA viruses. Given that viruses are infectious agents, identifying hosts of these new viruses carries significant implications for public health and provides valuable insights into the dynamics of the microbiome. However, determining the hosts of these newly discovered viruses is not always straightforward, especially in the case of viruses detected in environmental samples. Even for host-associated samples, it is not always correct to assign the sample origin as the host of the identified viruses. The process of assigning hosts to RNA viruses remains challenging due to their high mutation rates and vast diversity.

Results: In this study, we introduce RNAVirHost, a machine learning-based tool that predicts the hosts of RNA viruses solely based on viral genomes. RNAVirHost is a hierarchical classification framework that predicts hosts at different taxonomic levels. We demonstrate the superior accuracy of RNAVirHost in predicting hosts of RNA viruses through comprehensive comparisons with various state-of-the-art techniques. When applying to viruses from novel genera, RNAVirHost achieved the highest accuracy of 84.3%, outperforming the alignment-based strategy by 12.1%.

Conclusions: The application of machine learning models has proven beneficial in predicting hosts of RNA viruses. By integrating genomic traits and sequence homologies, RNAVirHost provides a cost-effective and efficient strategy for host prediction. We believe that RNAVirHost can greatly assist in RNA virus analyses and contribute to pandemic surveillance.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信