PVGA:使用迭代比对图的精确病毒基因组组装器。

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
Zhi Song, Dehan Cai, Yanni Sun, Lusheng Wang
{"title":"PVGA:使用迭代比对图的精确病毒基因组组装器。","authors":"Zhi Song, Dehan Cai, Yanni Sun, Lusheng Wang","doi":"10.1093/gigascience/giaf063","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Viral genome analysis is crucial for understanding virus evolution and mutation. Investigations into viral evolutionary dynamics and mutation patterns have garnered significant research attention since the outbreak of COVID-19. The basic structure of many virus genomes is highly conserved [1]. RNA viruses have high mutation rates, and single-nucleotide variations may induce substantial phenotypic alterations in terms of viral function and pathogenicity. Thus, special assembly methods are required for viral genome analysis.</p><p><strong>Result: </strong>PVGA starts with a reference genome and the sequencing reads. The first step in PVGA involves constructing an alignment graph based on a reference genome and the set of input sequencing reads. Then the optimal genomic path is determined through dynamic programming, maximizing the cumulative edge weights that reflect read support density across the alignment graph. The obtained path corresponds to a refined genome. Finally, we repeat the process by using the new reference genomes until no further improvement is possible. We evaluate PVGA's performance across both assembly and polishing tasks using simulated and real datasets, including both long reads and short reads. The experiments demonstrate that PVGA always outperforms popular existing programs in terms of the quality of assembly results, while the running time of our method is compatible to others. In particular, simulated Nanopore datasets show that our method can correctly report the true genomes with 0 mismatches and 0 indels.</p><p><strong>Conclusions: </strong>PVGA is a novel viral genome assembler that seamlessly integrates assembly and polishing into a unified workflow. Its design prioritizes high accuracy, enabling the detection of subtle genomic variations that can impact viral function and pathogenicity. By addressing the unique challenges of viral genome assembly, PVGA provides a reliable and precise solution for advancing our understanding of viral evolution and behavior.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206156/pdf/","citationCount":"0","resultStr":"{\"title\":\"PVGA: a precise viral genome assembler using an iterative alignment graph.\",\"authors\":\"Zhi Song, Dehan Cai, Yanni Sun, Lusheng Wang\",\"doi\":\"10.1093/gigascience/giaf063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Viral genome analysis is crucial for understanding virus evolution and mutation. Investigations into viral evolutionary dynamics and mutation patterns have garnered significant research attention since the outbreak of COVID-19. The basic structure of many virus genomes is highly conserved [1]. RNA viruses have high mutation rates, and single-nucleotide variations may induce substantial phenotypic alterations in terms of viral function and pathogenicity. Thus, special assembly methods are required for viral genome analysis.</p><p><strong>Result: </strong>PVGA starts with a reference genome and the sequencing reads. The first step in PVGA involves constructing an alignment graph based on a reference genome and the set of input sequencing reads. Then the optimal genomic path is determined through dynamic programming, maximizing the cumulative edge weights that reflect read support density across the alignment graph. The obtained path corresponds to a refined genome. Finally, we repeat the process by using the new reference genomes until no further improvement is possible. We evaluate PVGA's performance across both assembly and polishing tasks using simulated and real datasets, including both long reads and short reads. The experiments demonstrate that PVGA always outperforms popular existing programs in terms of the quality of assembly results, while the running time of our method is compatible to others. In particular, simulated Nanopore datasets show that our method can correctly report the true genomes with 0 mismatches and 0 indels.</p><p><strong>Conclusions: </strong>PVGA is a novel viral genome assembler that seamlessly integrates assembly and polishing into a unified workflow. Its design prioritizes high accuracy, enabling the detection of subtle genomic variations that can impact viral function and pathogenicity. By addressing the unique challenges of viral genome assembly, PVGA provides a reliable and precise solution for advancing our understanding of viral evolution and behavior.</p>\",\"PeriodicalId\":12581,\"journal\":{\"name\":\"GigaScience\",\"volume\":\"14 \",\"pages\":\"\"},\"PeriodicalIF\":11.8000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206156/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"GigaScience\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/gigascience/giaf063\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giaf063","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

背景:病毒基因组分析是了解病毒进化和突变的关键。自2019冠状病毒病爆发以来,对病毒进化动力学和突变模式的研究引起了极大的关注。许多病毒基因组的基本结构是高度保守的。RNA病毒具有很高的突变率,单核苷酸变异可能在病毒功能和致病性方面引起实质性的表型改变。因此,病毒基因组分析需要特殊的组装方法。结果:PVGA从一个参考基因组开始,测序结果为。PVGA的第一步是基于参考基因组和输入序列集构建比对图。然后通过动态规划确定最优基因组路径,最大化反映整个比对图读取支持密度的累积边权。得到的路径对应于一个精细的基因组。最后,我们使用新的参考基因组重复这个过程,直到没有进一步的改进。我们使用模拟和真实数据集(包括长读和短读)来评估PVGA在组装和抛光任务中的性能。实验表明,PVGA在装配结果质量上优于现有的流行程序,而我们的方法在运行时间上与其他方法是兼容的。特别是,模拟的纳米孔数据集表明,我们的方法可以正确地报告0错配和0索引的真实基因组。结论:PVGA是一种新型的病毒基因组组装器,将组装和抛光无缝集成到统一的工作流程中。它的设计优先考虑高准确性,能够检测可能影响病毒功能和致病性的细微基因组变异。通过解决病毒基因组组装的独特挑战,PVGA为推进我们对病毒进化和行为的理解提供了可靠和精确的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
PVGA: a precise viral genome assembler using an iterative alignment graph.

Background: Viral genome analysis is crucial for understanding virus evolution and mutation. Investigations into viral evolutionary dynamics and mutation patterns have garnered significant research attention since the outbreak of COVID-19. The basic structure of many virus genomes is highly conserved [1]. RNA viruses have high mutation rates, and single-nucleotide variations may induce substantial phenotypic alterations in terms of viral function and pathogenicity. Thus, special assembly methods are required for viral genome analysis.

Result: PVGA starts with a reference genome and the sequencing reads. The first step in PVGA involves constructing an alignment graph based on a reference genome and the set of input sequencing reads. Then the optimal genomic path is determined through dynamic programming, maximizing the cumulative edge weights that reflect read support density across the alignment graph. The obtained path corresponds to a refined genome. Finally, we repeat the process by using the new reference genomes until no further improvement is possible. We evaluate PVGA's performance across both assembly and polishing tasks using simulated and real datasets, including both long reads and short reads. The experiments demonstrate that PVGA always outperforms popular existing programs in terms of the quality of assembly results, while the running time of our method is compatible to others. In particular, simulated Nanopore datasets show that our method can correctly report the true genomes with 0 mismatches and 0 indels.

Conclusions: PVGA is a novel viral genome assembler that seamlessly integrates assembly and polishing into a unified workflow. Its design prioritizes high accuracy, enabling the detection of subtle genomic variations that can impact viral function and pathogenicity. By addressing the unique challenges of viral genome assembly, PVGA provides a reliable and precise solution for advancing our understanding of viral evolution and behavior.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信