Virus-host interactions predictor (VHIP): Machine learning approach to resolve microbial virus-host interaction networks

IF 3.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
G. Eric Bastien, Rachel N. Cable, Cecelia Batterbee, A. J. Wing, Luis Zaman, Melissa B. Duhaime
{"title":"Virus-host interactions predictor (VHIP): Machine learning approach to resolve microbial virus-host interaction networks","authors":"G. Eric Bastien, Rachel N. Cable, Cecelia Batterbee, A. J. Wing, Luis Zaman, Melissa B. Duhaime","doi":"10.1371/journal.pcbi.1011649","DOIUrl":null,"url":null,"abstract":"Viruses of microbes are ubiquitous biological entities that reprogram their hosts’ metabolisms during infection in order to produce viral progeny, impacting the ecology and evolution of microbiomes with broad implications for human and environmental health. Advances in genome sequencing have led to the discovery of millions of novel viruses and an appreciation for the great diversity of viruses on Earth. Yet, with knowledge of only <jats:italic>“who is there</jats:italic>?<jats:italic>”</jats:italic> we fall short in our ability to infer the impacts of viruses on microbes at population, community, and ecosystem-scales. To do this, we need a more explicit understanding <jats:italic>“who do they infect</jats:italic>?<jats:italic>”</jats:italic> Here, we developed a novel machine learning model (ML), Virus-Host Interaction Predictor (VHIP), to predict virus-host interactions (infection/non-infection) from input virus and host genomes. This ML model was trained and tested on a high-value manually curated set of 8849 virus-host pairs and their corresponding sequence data. The resulting dataset, ‘Virus Host Range network’ (VHRnet), is core to VHIP functionality. Each data point that underlies the VHIP training and testing represents a lab-tested virus-host pair in VHRnet, from which meaningful signals of viral adaptation to host were computed from genomic sequences. VHIP departs from existing virus-host prediction models in its ability to predict multiple interactions rather than predicting a single most likely host or host clade. As a result, VHIP is able to infer the complexity of virus-host networks in natural systems. VHIP has an 87.8% accuracy rate at predicting interactions between virus-host pairs at the species level and can be applied to novel viral and host population genomes reconstructed from metagenomic datasets.","PeriodicalId":20241,"journal":{"name":"PLoS Computational Biology","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1371/journal.pcbi.1011649","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Viruses of microbes are ubiquitous biological entities that reprogram their hosts’ metabolisms during infection in order to produce viral progeny, impacting the ecology and evolution of microbiomes with broad implications for human and environmental health. Advances in genome sequencing have led to the discovery of millions of novel viruses and an appreciation for the great diversity of viruses on Earth. Yet, with knowledge of only “who is there? we fall short in our ability to infer the impacts of viruses on microbes at population, community, and ecosystem-scales. To do this, we need a more explicit understanding “who do they infect? Here, we developed a novel machine learning model (ML), Virus-Host Interaction Predictor (VHIP), to predict virus-host interactions (infection/non-infection) from input virus and host genomes. This ML model was trained and tested on a high-value manually curated set of 8849 virus-host pairs and their corresponding sequence data. The resulting dataset, ‘Virus Host Range network’ (VHRnet), is core to VHIP functionality. Each data point that underlies the VHIP training and testing represents a lab-tested virus-host pair in VHRnet, from which meaningful signals of viral adaptation to host were computed from genomic sequences. VHIP departs from existing virus-host prediction models in its ability to predict multiple interactions rather than predicting a single most likely host or host clade. As a result, VHIP is able to infer the complexity of virus-host networks in natural systems. VHIP has an 87.8% accuracy rate at predicting interactions between virus-host pairs at the species level and can be applied to novel viral and host population genomes reconstructed from metagenomic datasets.
病毒-宿主相互作用预测器(VHIP):解决微生物病毒-宿主相互作用网络的机器学习方法
微生物病毒是无处不在的生物实体,它们在感染过程中重塑宿主的新陈代谢,以产生病毒后代,影响微生物组的生态和进化,对人类和环境健康产生广泛影响。随着基因组测序技术的进步,人们发现了数以百万计的新型病毒,并认识到地球上病毒的巨大多样性。然而,我们只知道 "谁在那里?",却无法推断病毒在种群、群落和生态系统尺度上对微生物的影响。为此,我们需要更明确地了解 "它们感染了谁?在此,我们开发了一种新型机器学习模型(ML)--病毒-宿主相互作用预测器(VHIP),用于从输入的病毒和宿主基因组预测病毒-宿主相互作用(感染/非感染)。该 ML 模型在由 8849 对病毒-宿主及其相应序列数据组成的高价值人工编辑集上进行了训练和测试。由此产生的数据集 "病毒宿主范围网络"(VHRnet)是 VHIP 功能的核心。作为 VHIP 训练和测试基础的每个数据点都代表了 VHRnet 中经过实验室测试的病毒-宿主配对,从中可以根据基因组序列计算出病毒对宿主适应性的有意义信号。VHIP 与现有的病毒-宿主预测模型不同,它能够预测多种相互作用,而不是预测单一的最可能宿主或宿主支系。因此,VHIP 能够推断自然系统中病毒-宿主网络的复杂性。VHIP 在物种水平上预测病毒-宿主对之间相互作用的准确率为 87.8%,并可应用于从元基因组数据集重建的新型病毒和宿主群体基因组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
PLoS Computational Biology
PLoS Computational Biology BIOCHEMICAL RESEARCH METHODS-MATHEMATICAL & COMPUTATIONAL BIOLOGY
CiteScore
7.10
自引率
4.70%
发文量
820
审稿时长
2.5 months
期刊介绍: PLOS Computational Biology features works of exceptional significance that further our understanding of living systems at all scales—from molecules and cells, to patient populations and ecosystems—through the application of computational methods. Readers include life and computational scientists, who can take the important findings presented here to the next level of discovery. Research articles must be declared as belonging to a relevant section. More information about the sections can be found in the submission guidelines. Research articles should model aspects of biological systems, demonstrate both methodological and scientific novelty, and provide profound new biological insights. Generally, reliability and significance of biological discovery through computation should be validated and enriched by experimental studies. Inclusion of experimental validation is not required for publication, but should be referenced where possible. Inclusion of experimental validation of a modest biological discovery through computation does not render a manuscript suitable for PLOS Computational Biology. Research articles specifically designated as Methods papers should describe outstanding methods of exceptional importance that have been shown, or have the promise to provide new biological insights. The method must already be widely adopted, or have the promise of wide adoption by a broad community of users. Enhancements to existing published methods will only be considered if those enhancements bring exceptional new capabilities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信