Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities.

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
Gleb Goussarov, Mohamed Mysara, Ilse Cleenwerck, Jürgen Claesen, Natalie Leys, Peter Vandamme, Rob Van Houdt
{"title":"Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities.","authors":"Gleb Goussarov, Mohamed Mysara, Ilse Cleenwerck, Jürgen Claesen, Natalie Leys, Peter Vandamme, Rob Van Houdt","doi":"10.1099/mic.0.001469","DOIUrl":null,"url":null,"abstract":"<p><p>Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11261854/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1099/mic.0.001469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.

用于复杂微生物群落元基因组测序的短读程、长读程和混合读程组装器的基准测试。
在测序技术不断发展的推动下,元基因组群落分析正迅速为微生物学的许多方面提供见解,并成为一种基础工具。Illumina、牛津纳米孔技术公司(ONT)和太平洋生物科学公司(PacBio)的技术处于领先地位,但各有优缺点。Illumina 能以低成本提供准确的读数,但其长度太短,无法接近细菌基因组。长读数克服了这一限制,但这些技术产生的读数精度较低(ONT)或吞吐量较低(PacBio 高保真读数)。在关键的第一步分析中,对读数进行组装,以重建群落中的基因组或单个基因。然而,迄今为止,现有组装器的性能还从未受到过复杂模拟元基因组的挑战。在这里,我们评估了目前使用短读数、长读数或两种读数类型的组装器在复杂模拟元基因组上的性能,该模拟元基因组由 227 种具有不同亲缘关系的细菌菌株组成。我们发现,目前的许多组装器都不适合处理如此复杂的元基因组。此外,混合组装也没有发挥其潜力。我们的结论是,用 CANU 组装的 ONT 读数和用 SPAdes 组装的 Illumina 读数分别为重建复杂元基因组的基因组和单个基因提供了最佳价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信