Microbiome data: tell me which metrics and I will tell you which communities.

IF 6.1 Q1 ECOLOGY
ISME communications Pub Date : 2025-07-24 eCollection Date: 2025-01-01 DOI:10.1093/ismeco/ycaf125
Alessandro Fuschi, Alessandra Merlotti, Daniel Remondini
{"title":"Microbiome data: tell me which metrics and I will tell you which communities.","authors":"Alessandro Fuschi, Alessandra Merlotti, Daniel Remondini","doi":"10.1093/ismeco/ycaf125","DOIUrl":null,"url":null,"abstract":"<p><p>In microbial community studies, analyzing diversity is crucial for uncovering ecological complexity. However, the intrinsic characteristics of Next-gen sequencing data challenge the use of Euclidean metrics for estimating proximity and correlation. Consequently, a variety of distance measures have been developed within ecological frameworks. In this study, we compare several of these metrics-including Bray-Curtis, Canberra, Jensen-Shannon, Hellinger, Euclidean, and Aitchison distances-demonstrating how the choice of metric can significantly influence the interpretation of microbial community structures. Among these, Aitchison distance specifically defined for compositional data shows markedly different behavior from the others, highlighting different features related to the data. We consider two real-world examples: the human gut microbiome sampled using 16S rRNA sequencing with multiple measurements for different patients (G-HMP2) and urban sewage environmental metagenomes collected over time at different sites through shotgun sequencing (E-WADES). We show that, for the same dataset-independently on the sequencing technique or on the sampling context-the community structure depends strongly on the choice of specific metrics. This can be explained by the mathematical properties of the chosen metrics and the specific characteristics of microbiome data, namely their high heterogeneity in species abundance. This provides clear insights into how distance metrics influence interpretation and assists in choosing the most appropriate one for the study objectives.</p>","PeriodicalId":73516,"journal":{"name":"ISME communications","volume":"5 1","pages":"ycaf125"},"PeriodicalIF":6.1000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342790/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISME communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ismeco/ycaf125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

In microbial community studies, analyzing diversity is crucial for uncovering ecological complexity. However, the intrinsic characteristics of Next-gen sequencing data challenge the use of Euclidean metrics for estimating proximity and correlation. Consequently, a variety of distance measures have been developed within ecological frameworks. In this study, we compare several of these metrics-including Bray-Curtis, Canberra, Jensen-Shannon, Hellinger, Euclidean, and Aitchison distances-demonstrating how the choice of metric can significantly influence the interpretation of microbial community structures. Among these, Aitchison distance specifically defined for compositional data shows markedly different behavior from the others, highlighting different features related to the data. We consider two real-world examples: the human gut microbiome sampled using 16S rRNA sequencing with multiple measurements for different patients (G-HMP2) and urban sewage environmental metagenomes collected over time at different sites through shotgun sequencing (E-WADES). We show that, for the same dataset-independently on the sequencing technique or on the sampling context-the community structure depends strongly on the choice of specific metrics. This can be explained by the mathematical properties of the chosen metrics and the specific characteristics of microbiome data, namely their high heterogeneity in species abundance. This provides clear insights into how distance metrics influence interpretation and assists in choosing the most appropriate one for the study objectives.

微生物组数据:告诉我哪些指标,我会告诉你哪些群落。
在微生物群落研究中,多样性分析是揭示生态复杂性的关键。然而,下一代测序数据的内在特征对使用欧几里得度量来估计接近性和相关性提出了挑战。因此,在生态框架内开发了各种距离测量方法。在这项研究中,我们比较了几种度量——包括布雷-柯蒂斯距离、堪培拉距离、詹森-香农距离、海灵格距离、欧几里得距离和艾奇逊距离——展示了度量的选择如何显著影响微生物群落结构的解释。其中,专门为成分数据定义的艾奇逊距离表现出与其他距离明显不同的行为,突出了与数据相关的不同特征。我们考虑了两个现实世界的例子:使用16S rRNA测序对不同患者进行多次测量(G-HMP2)和通过鸟枪法测序(E-WADES)在不同地点收集的城市污水环境宏基因组。我们表明,对于相同的数据集,独立于测序技术或采样环境,群落结构强烈依赖于特定指标的选择。这可以通过所选指标的数学性质和微生物组数据的特定特征来解释,即它们在物种丰度上的高度异质性。这为距离度量如何影响解释提供了清晰的见解,并有助于为研究目标选择最合适的度量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信