Differences in metagenome coverage may confound abundance-based and diversity conclusions and how to deal with them.

IF 6.1 Q1 ECOLOGY
ISME communications Pub Date : 2025-09-10 eCollection Date: 2025-01-01 DOI:10.1093/ismeco/ycaf140
Borja Aldeguer-Riquelme, Luis M Rodriguez-R, Konstantinos T Konstantinidis
{"title":"Differences in metagenome coverage may confound abundance-based and diversity conclusions and how to deal with them.","authors":"Borja Aldeguer-Riquelme, Luis M Rodriguez-R, Konstantinos T Konstantinidis","doi":"10.1093/ismeco/ycaf140","DOIUrl":null,"url":null,"abstract":"<p><p>The importance of rarefying ecological or amplicon sequencing data to a standardized level of diversity coverage for reliable diversity comparisons across samples is well recognized. However, the importance of diversity coverage, i.e. the fraction of the genomic diversity of a sample sequenced, in comparative shotgun metagenomic studies remains frequently overlooked. Using both <i>in silico</i> and natural metagenomes from a wide range of environments, we demonstrate that uneven metagenome coverage can result in misleading biological conclusions, particularly for identifying differentially abundant features, i.e. groups of genes or genomes assigned to the same protein family or taxonomic rank, respectively, and for comparing diversity between samples. The main underlying cause is that not all members of a feature may be detectable, and thus counted, across such unevenly covered metagenomes depending on the sequencing effort applied and the underlying member-abundance curves. Unfortunately, 99.5% of previous comparative metagenomic studies have overlooked this metric, suggesting that their reported results might be misleading. We show that achieving high Nonpareil coverage (≥0.9), a metric that estimates metagenome diversity coverage, is the most reliable strategy to mitigate this issue. When high Nonpareil coverage is not achievable, such as for highly diverse and complex samples like soils, we show that standardizing (or subsampling) metagenomic datasets to the same Nonpareil coverage, rather than sequencing effort, prior to comparative analysis provides for more accurate results. We provide a set of practical recommendations and the corresponding Python scripts to help researchers to assess and standardize metagenome diversity coverage for their comparative analyses.</p>","PeriodicalId":73516,"journal":{"name":"ISME communications","volume":"5 1","pages":"ycaf140"},"PeriodicalIF":6.1000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12477595/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISME communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ismeco/ycaf140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The importance of rarefying ecological or amplicon sequencing data to a standardized level of diversity coverage for reliable diversity comparisons across samples is well recognized. However, the importance of diversity coverage, i.e. the fraction of the genomic diversity of a sample sequenced, in comparative shotgun metagenomic studies remains frequently overlooked. Using both in silico and natural metagenomes from a wide range of environments, we demonstrate that uneven metagenome coverage can result in misleading biological conclusions, particularly for identifying differentially abundant features, i.e. groups of genes or genomes assigned to the same protein family or taxonomic rank, respectively, and for comparing diversity between samples. The main underlying cause is that not all members of a feature may be detectable, and thus counted, across such unevenly covered metagenomes depending on the sequencing effort applied and the underlying member-abundance curves. Unfortunately, 99.5% of previous comparative metagenomic studies have overlooked this metric, suggesting that their reported results might be misleading. We show that achieving high Nonpareil coverage (≥0.9), a metric that estimates metagenome diversity coverage, is the most reliable strategy to mitigate this issue. When high Nonpareil coverage is not achievable, such as for highly diverse and complex samples like soils, we show that standardizing (or subsampling) metagenomic datasets to the same Nonpareil coverage, rather than sequencing effort, prior to comparative analysis provides for more accurate results. We provide a set of practical recommendations and the corresponding Python scripts to help researchers to assess and standardize metagenome diversity coverage for their comparative analyses.

Abstract Image

Abstract Image

Abstract Image

宏基因组覆盖率的差异可能会混淆基于丰度和多样性的结论以及如何处理它们。
将生态或扩增子测序数据简化为标准化水平的多样性覆盖,以便在样本之间进行可靠的多样性比较,其重要性已得到充分认识。然而,在比较霰弹枪宏基因组研究中,多样性覆盖的重要性,即测序样本的基因组多样性的比例,经常被忽视。使用来自广泛环境的计算机宏基因组和自然宏基因组,我们证明了不均匀的宏基因组覆盖可能导致误导性的生物学结论,特别是在识别差异丰富的特征时,即分别分配给相同蛋白质家族或分类等级的基因或基因组组,以及比较样品之间的多样性。主要的潜在原因是,并不是一个特征的所有成员都可以被检测到,因此,在这些不均匀覆盖的宏基因组中,这取决于所应用的测序工作和潜在的成员丰度曲线。不幸的是,之前99.5%的比较宏基因组研究忽略了这一指标,这表明他们报告的结果可能具有误导性。研究表明,实现高非平行覆盖率(≥0.9)是缓解这一问题的最可靠策略。非平行覆盖率是估计宏基因组多样性覆盖率的一个指标。当高度非平行覆盖无法实现时,例如土壤等高度多样化和复杂的样本,我们表明,在比较分析之前,将宏基因组数据集标准化(或亚采样)到相同的非平行覆盖,而不是测序工作,可以提供更准确的结果。我们提供了一套实用的建议和相应的Python脚本,以帮助研究人员评估和标准化宏基因组多样性覆盖范围,以便进行比较分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信