VISTA:病毒基因组序列快速分类分配工具。

Tao Zhang, Yiyun Liu, Xutong Guo, Xinran Zhang, Xinchang Zheng, Mochen Zhang, Yiming Bao
{"title":"VISTA:病毒基因组序列快速分类分配工具。","authors":"Tao Zhang, Yiyun Liu, Xutong Guo, Xinran Zhang, Xinchang Zheng, Mochen Zhang, Yiming Bao","doi":"10.1093/gpbjnl/qzae082","DOIUrl":null,"url":null,"abstract":"<p><p>The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce VISTA (Virus Sequence-based Taxonomy Assignment), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to PASC (Pairwise Sequence Comparison), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA's scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista.</p>","PeriodicalId":94020,"journal":{"name":"Genomics, proteomics & bioinformatics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VISTA: A Tool for Fast Taxonomic Assignment of Viral Genome Sequences.\",\"authors\":\"Tao Zhang, Yiyun Liu, Xutong Guo, Xinran Zhang, Xinchang Zheng, Mochen Zhang, Yiming Bao\",\"doi\":\"10.1093/gpbjnl/qzae082\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce VISTA (Virus Sequence-based Taxonomy Assignment), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to PASC (Pairwise Sequence Comparison), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA's scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista.</p>\",\"PeriodicalId\":94020,\"journal\":{\"name\":\"Genomics, proteomics & bioinformatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genomics, proteomics & bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/gpbjnl/qzae082\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genomics, proteomics & bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/gpbjnl/qzae082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着公共数据库中病毒基因组序列数量的迅速增加,需要一个可扩展、通用和自动化的初步分类框架来进行全面的病毒研究。我们在此介绍 VISTA(基于病毒序列的分类分配),它是一种计算工具,采用了新颖的成对序列比较系统和自动分界阈值识别框架来进行病毒分类。VISTA 利用物理化学特性序列、k-mer 剖面和机器学习技术,构建了一个基于距离的稳健分类分配框架。VISTA 在功能上类似于 PASC(成对序列比较),后者是一种广泛使用的基于成对序列比较的病毒分类工具,VISTA 通过显著提高分类组的分离度、更客观的分类划分阈值、大大提高的速度和更广泛的应用范围,展示了卓越的性能。我们成功地将 VISTA 应用于 38 个病毒科和 Caudoviricetes 类。这证明了 VISTA 的可扩展性、稳健性以及自动、准确地对原核和真核病毒进行分类的能力。此外,将 VISTA 应用于从元基因组数据中恢复的 679 个未分类的原核病毒基因组,发现了 46 个新的病毒科。VISTA 既是命令行工具,也是用户友好的门户网站,网址是 https://ngdc.cncb.ac.cn/vista。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
VISTA: A Tool for Fast Taxonomic Assignment of Viral Genome Sequences.

The rapid expansion of the number of viral genome sequences in public databases necessitates a scalable, universal, and automated preliminary taxonomic framework for comprehensive virus studies. Here, we introduce VISTA (Virus Sequence-based Taxonomy Assignment), a computational tool that employs a novel pairwise sequence comparison system and an automatic demarcation threshold identification framework for virus taxonomy. Leveraging physio-chemical property sequences, k-mer profiles, and machine learning techniques, VISTA constructs a robust distance-based framework for taxonomic assignment. Functionally similar to PASC (Pairwise Sequence Comparison), a widely used virus assignment tool based on pairwise sequence comparison, VISTA demonstrates superior performance by providing significantly improved separation for taxonomic groups, more objective taxonomic demarcation thresholds, greatly enhanced speed, and a wider application scope. We successfully applied VISTA to 38 virus families, as well as to the class Caudoviricetes. This demonstrates VISTA's scalability, robustness, and ability to automatically and accurately assign taxonomy to both prokaryotic and eukaryotic viruses. Furthermore, the application of VISTA to 679 unclassified prokaryotic virus genomes recovered from metagenomic data identified 46 novel virus families. VISTA is available as both a command line tool and a user-friendly web portal at https://ngdc.cncb.ac.cn/vista.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信