BonoboFlow: viral genome assembly and haplotype reconstruction from nanopore reads.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances Pub Date : 2025-05-13 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf115

Christian Ndekezi, Drake Byamukama, Frank Kato, Denis Omara, Angella Nakyanzi, Fortunate Natwijuka, Susan Mugaba, Alfred Ssekagiri, Nicholas Bbosa, Obondo James Sande, Magambo Phillip Kimuda, Denis K Byarugaba, Anne Kapaata, Jyoti Sutar, Jayanta Bhattacharya, Pontiano Kaleebu, Sheila N Balinda

{"title":"BonoboFlow: viral genome assembly and haplotype reconstruction from nanopore reads.","authors":"Christian Ndekezi, Drake Byamukama, Frank Kato, Denis Omara, Angella Nakyanzi, Fortunate Natwijuka, Susan Mugaba, Alfred Ssekagiri, Nicholas Bbosa, Obondo James Sande, Magambo Phillip Kimuda, Denis K Byarugaba, Anne Kapaata, Jyoti Sutar, Jayanta Bhattacharya, Pontiano Kaleebu, Sheila N Balinda","doi":"10.1093/bioadv/vbaf115","DOIUrl":null,"url":null,"abstract":"Summary: Viral genome sequencing and analysis are crucial for understanding the diversity and evolution of viruses. Traditional Sanger sequencing is limited by low sequence depth and is labor intensive. Next-Generation Sequencing (NGS) methods, such as Illumina, offer improved sequencing depth and throughput but face challenges with accurate reconstruction of viral genomes due to genome fragmentation. Third-generation sequencing platforms, such as PacBio and Oxford Nanopore Technologies (ONT), generate long reads with high throughput. However, PacBio is constrained by substantial resource requirements, while ONT suffers from inherently high error rates. Moreover, standardized pipelines for ONT sequencing encompassing basecalling to genome assembly remain limited.Results: Here, we introduce BonoboFlow, a standardized Nextflow pipeline designed to streamline ONT-based viral genome assembly/haplotype reconstruction. BonoboFlow integrates key processing steps, including basecalling, read filtering, chimeric read removal, error correction, draft genome assembly/haplotype reconstruction, and genome polishing. The pipeline accepts raw POD5 or basecalled FASTQ files as input, produces FASTA consensus files as output, and uses a reference genome (in FASTA format) for contaminant read filtering. BonoboFlow's containerized implementation via Docker and Singularity ensures seamless deployment across diverse computing environments. While BonoboFlow excels in assembling small and medium viral genomes, it showed challenges when reconstructing large viral genomes.Availability and implementation: BonoboFlow and corresponding containerized images are publicly available at https://github.com/nchis09/BonoboFlow and https://hub.docker.com/r/nchis09/bonobo_image. The test dataset is available at SRA repository Accession number: PRJNA1137155, http://www.ncbi.nlm.nih.gov/bioproject/1137155.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf115"},"PeriodicalIF":2.8000,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12141814/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Summary: Viral genome sequencing and analysis are crucial for understanding the diversity and evolution of viruses. Traditional Sanger sequencing is limited by low sequence depth and is labor intensive. Next-Generation Sequencing (NGS) methods, such as Illumina, offer improved sequencing depth and throughput but face challenges with accurate reconstruction of viral genomes due to genome fragmentation. Third-generation sequencing platforms, such as PacBio and Oxford Nanopore Technologies (ONT), generate long reads with high throughput. However, PacBio is constrained by substantial resource requirements, while ONT suffers from inherently high error rates. Moreover, standardized pipelines for ONT sequencing encompassing basecalling to genome assembly remain limited.

Results: Here, we introduce BonoboFlow, a standardized Nextflow pipeline designed to streamline ONT-based viral genome assembly/haplotype reconstruction. BonoboFlow integrates key processing steps, including basecalling, read filtering, chimeric read removal, error correction, draft genome assembly/haplotype reconstruction, and genome polishing. The pipeline accepts raw POD5 or basecalled FASTQ files as input, produces FASTA consensus files as output, and uses a reference genome (in FASTA format) for contaminant read filtering. BonoboFlow's containerized implementation via Docker and Singularity ensures seamless deployment across diverse computing environments. While BonoboFlow excels in assembling small and medium viral genomes, it showed challenges when reconstructing large viral genomes.

Availability and implementation: BonoboFlow and corresponding containerized images are publicly available at https://github.com/nchis09/BonoboFlow and https://hub.docker.com/r/nchis09/bonobo_image. The test dataset is available at SRA repository Accession number: PRJNA1137155, http://www.ncbi.nlm.nih.gov/bioproject/1137155.

查看原文本刊更多论文

BonoboFlow：病毒基因组组装和单倍型重建从纳米孔读取。

摘要：病毒基因组测序和分析对于了解病毒的多样性和进化至关重要。传统的桑格测序受序列深度低、劳动强度大的限制。下一代测序（NGS）方法，如Illumina，提供了更高的测序深度和通量，但由于基因组片段化，在准确重建病毒基因组方面面临挑战。第三代测序平台，如PacBio和Oxford Nanopore Technologies (ONT)，可以产生高通量的长读段。然而，PacBio受到大量资源需求的限制，而ONT则存在固有的高错误率。此外，包括基因组组装的碱基调用在内的ONT测序标准化管道仍然有限。结果：在这里，我们介绍了BonoboFlow，一个标准化的Nextflow管道，旨在简化基于ont的病毒基因组组装/单倍型重建。BonoboFlow集成了关键的处理步骤，包括碱基调用，读取过滤，嵌合读取去除，纠错，草图基因组组装/单倍型重建和基因组抛光。该管道接受原始POD5或称为FASTQ的基础文件作为输入，生成FASTA共识文件作为输出，并使用参考基因组（以FASTA格式）进行污染物读取过滤。BonoboFlow通过Docker和Singularity的容器化实现确保了跨不同计算环境的无缝部署。虽然BonoboFlow在组装小型和中型病毒基因组方面表现出色，但在重建大型病毒基因组时却面临挑战。可用性和实现：BonoboFlow和相应的容器化映像可在https://github.com/nchis09/BonoboFlow和https://hub.docker.com/r/nchis09/bonobo_image上公开获得。测试数据集可在SRA存储库中获得，登录号：PRJNA1137155， http://www.ncbi.nlm.nih.gov/bioproject/1137155。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics advances

CiteScore

1.60

自引率

0.00%

发文量