Alessandro Fuschi, Alessandra Merlotti, Daniel Remondini
{"title":"Microbiome data: tell me which metrics and I will tell you which communities.","authors":"Alessandro Fuschi, Alessandra Merlotti, Daniel Remondini","doi":"10.1093/ismeco/ycaf125","DOIUrl":null,"url":null,"abstract":"<p><p>In microbial community studies, analyzing diversity is crucial for uncovering ecological complexity. However, the intrinsic characteristics of Next-gen sequencing data challenge the use of Euclidean metrics for estimating proximity and correlation. Consequently, a variety of distance measures have been developed within ecological frameworks. In this study, we compare several of these metrics-including Bray-Curtis, Canberra, Jensen-Shannon, Hellinger, Euclidean, and Aitchison distances-demonstrating how the choice of metric can significantly influence the interpretation of microbial community structures. Among these, Aitchison distance specifically defined for compositional data shows markedly different behavior from the others, highlighting different features related to the data. We consider two real-world examples: the human gut microbiome sampled using 16S rRNA sequencing with multiple measurements for different patients (G-HMP2) and urban sewage environmental metagenomes collected over time at different sites through shotgun sequencing (E-WADES). We show that, for the same dataset-independently on the sequencing technique or on the sampling context-the community structure depends strongly on the choice of specific metrics. This can be explained by the mathematical properties of the chosen metrics and the specific characteristics of microbiome data, namely their high heterogeneity in species abundance. This provides clear insights into how distance metrics influence interpretation and assists in choosing the most appropriate one for the study objectives.</p>","PeriodicalId":73516,"journal":{"name":"ISME communications","volume":"5 1","pages":"ycaf125"},"PeriodicalIF":6.1000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12342790/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISME communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/ismeco/ycaf125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"ECOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
In microbial community studies, analyzing diversity is crucial for uncovering ecological complexity. However, the intrinsic characteristics of Next-gen sequencing data challenge the use of Euclidean metrics for estimating proximity and correlation. Consequently, a variety of distance measures have been developed within ecological frameworks. In this study, we compare several of these metrics-including Bray-Curtis, Canberra, Jensen-Shannon, Hellinger, Euclidean, and Aitchison distances-demonstrating how the choice of metric can significantly influence the interpretation of microbial community structures. Among these, Aitchison distance specifically defined for compositional data shows markedly different behavior from the others, highlighting different features related to the data. We consider two real-world examples: the human gut microbiome sampled using 16S rRNA sequencing with multiple measurements for different patients (G-HMP2) and urban sewage environmental metagenomes collected over time at different sites through shotgun sequencing (E-WADES). We show that, for the same dataset-independently on the sequencing technique or on the sampling context-the community structure depends strongly on the choice of specific metrics. This can be explained by the mathematical properties of the chosen metrics and the specific characteristics of microbiome data, namely their high heterogeneity in species abundance. This provides clear insights into how distance metrics influence interpretation and assists in choosing the most appropriate one for the study objectives.