From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs.

PHAGE (New Rochelle, N.Y.) Pub Date : 2021-12-01 Epub Date: 2021-12-16 DOI:10.1089/phage.2021.0008
Guillermo Rangel-Pineros, Andrew Millard, Slawomir Michniewski, David Scanlan, Kimmo Sirén, Alejandro Reyes, Bent Petersen, Martha R J Clokie, Thomas Sicheritz-Pontén
{"title":"From Trees to Clouds: PhageClouds for Fast Comparison of ∼640,000 Phage Genomic Sequences and Host-Centric Visualization Using Genomic Network Graphs.","authors":"Guillermo Rangel-Pineros, Andrew Millard, Slawomir Michniewski, David Scanlan, Kimmo Sirén, Alejandro Reyes, Bent Petersen, Martha R J Clokie, Thomas Sicheritz-Pontén","doi":"10.1089/phage.2021.0008","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Background:</i></b> Fast and computationally efficient strategies are required to explore genomic relationships within an increasingly large and diverse phage sequence space. Here, we present PhageClouds, a novel approach using a graph database of phage genomic sequences and their intergenomic distances to explore the phage genomic sequence space. <b><i>Methods:</i></b> A total of 640,000 phage genomic sequences were retrieved from a variety of databases and public virome assemblies. Intergenomic distances were calculated with dashing, an alignment-free method suitable for handling massive data sets. These data were used to build a Neo4j<sup>®</sup> graph database. <b><i>Results:</i></b> PhageClouds supported the search of related phages among all complete phage genomes from GenBank for a single query phage in just 10 s. Moreover, PhageClouds expanded the number of closely related phage sequences detected for both finished and draft phage genomes, in comparison with searches exclusively targeting phage entries from GenBank. <b><i>Conclusions:</i></b> PhageClouds is a novel resource that will facilitate the analysis of phage genomic sequences and the characterization of assembled phage genomes.</p>","PeriodicalId":74428,"journal":{"name":"PHAGE (New Rochelle, N.Y.)","volume":" ","pages":"194-203"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7d/81/phage.2021.0008.PMC9041511.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PHAGE (New Rochelle, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1089/phage.2021.0008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/12/16 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Fast and computationally efficient strategies are required to explore genomic relationships within an increasingly large and diverse phage sequence space. Here, we present PhageClouds, a novel approach using a graph database of phage genomic sequences and their intergenomic distances to explore the phage genomic sequence space. Methods: A total of 640,000 phage genomic sequences were retrieved from a variety of databases and public virome assemblies. Intergenomic distances were calculated with dashing, an alignment-free method suitable for handling massive data sets. These data were used to build a Neo4j® graph database. Results: PhageClouds supported the search of related phages among all complete phage genomes from GenBank for a single query phage in just 10 s. Moreover, PhageClouds expanded the number of closely related phage sequences detected for both finished and draft phage genomes, in comparison with searches exclusively targeting phage entries from GenBank. Conclusions: PhageClouds is a novel resource that will facilitate the analysis of phage genomic sequences and the characterization of assembled phage genomes.

Abstract Image

Abstract Image

Abstract Image

从树到云:PhageClouds 用于快速比较 ∼640,000 个噬菌体基因组序列,以及使用基因组网络图进行以宿主为中心的可视化。
背景:在日益庞大和多样化的噬菌体序列空间中探索基因组关系需要快速和高效的计算策略。在此,我们介绍 PhageClouds,这是一种利用噬菌体基因组序列图数据库及其基因组间距离来探索噬菌体基因组序列空间的新方法。方法:我们从各种数据库和公共病毒组汇编中检索了总共 64 万个噬菌体基因组序列。基因组间距离用 dashing 计算,这是一种适用于处理海量数据集的无比对方法。这些数据被用于建立 Neo4j® 图数据库。结果PhageClouds 支持在 GenBank 的所有完整噬菌体基因组中搜索相关噬菌体,单个查询噬菌体只需 10 秒钟。此外,与只针对 GenBank 中的噬菌体条目进行的搜索相比,PhageClouds 增加了在完整噬菌体基因组和草案噬菌体基因组中检测到的密切相关噬菌体序列的数量。结论噬菌体云是一种新颖的资源,有助于分析噬菌体基因组序列和鉴定组装的噬菌体基因组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信