Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities.

IF 2.6 4区生物学 Q3 MICROBIOLOGY

Microbiology-Sgm Pub Date : 2024-06-01 DOI:10.1099/mic.0.001469

Gleb Goussarov, Mohamed Mysara, Ilse Cleenwerck, Jürgen Claesen, Natalie Leys, Peter Vandamme, Rob Van Houdt

{"title":"Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities.","authors":"Gleb Goussarov, Mohamed Mysara, Ilse Cleenwerck, Jürgen Claesen, Natalie Leys, Peter Vandamme, Rob Van Houdt","doi":"10.1099/mic.0.001469","DOIUrl":null,"url":null,"abstract":"<p><p>Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.</p>","PeriodicalId":49819,"journal":{"name":"Microbiology-Sgm","volume":"170 6","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11261854/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbiology-Sgm","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mic.0.001469","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MICROBIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.

查看原文本刊更多论文

用于复杂微生物群落元基因组测序的短读程、长读程和混合读程组装器的基准测试。

在测序技术不断发展的推动下，元基因组群落分析正迅速为微生物学的许多方面提供见解，并成为一种基础工具。Illumina、牛津纳米孔技术公司（ONT）和太平洋生物科学公司（PacBio）的技术处于领先地位，但各有优缺点。Illumina 能以低成本提供准确的读数，但其长度太短，无法接近细菌基因组。长读数克服了这一限制，但这些技术产生的读数精度较低（ONT）或吞吐量较低（PacBio 高保真读数）。在关键的第一步分析中，对读数进行组装，以重建群落中的基因组或单个基因。然而，迄今为止，现有组装器的性能还从未受到过复杂模拟元基因组的挑战。在这里，我们评估了目前使用短读数、长读数或两种读数类型的组装器在复杂模拟元基因组上的性能，该模拟元基因组由 227 种具有不同亲缘关系的细菌菌株组成。我们发现，目前的许多组装器都不适合处理如此复杂的元基因组。此外，混合组装也没有发挥其潜力。我们的结论是，用 CANU 组装的 ONT 读数和用 SPAdes 组装的 Illumina 读数分别为重建复杂元基因组的基因组和单个基因提供了最佳价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Microbiology-Sgm 生物-微生物学

CiteScore

4.60

自引率

7.10%

发文量

132

审稿时长

3.0 months

期刊介绍： We publish high-quality original research on bacteria, fungi, protists, archaea, algae, parasites and other microscopic life forms. Topics include but are not limited to: Antimicrobials and antimicrobial resistance Bacteriology and parasitology Biochemistry and biophysics Biofilms and biological systems Biotechnology and bioremediation Cell biology and signalling Chemical biology Cross-disciplinary work Ecology and environmental microbiology Food microbiology Genetics Host–microbe interactions Microbial methods and techniques Microscopy and imaging Omics, including genomics, proteomics and metabolomics Physiology and metabolism Systems biology and synthetic biology The microbiome.