Group-wise normalization in differential abundance analysis of microbiome samples.

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-07-29 DOI:10.1186/s12859-025-06235-9

Dylan Clark-Boucher, Brent A Coull, Harrison T Reeder, Fenglei Wang, Qi Sun, Jacqueline R Starr, Kyu Ha Lee

{"title":"Group-wise normalization in differential abundance analysis of microbiome samples.","authors":"Dylan Clark-Boucher, Brent A Coull, Harrison T Reeder, Fenglei Wang, Qi Sun, Jacqueline R Starr, Kyu Ha Lee","doi":"10.1186/s12859-025-06235-9","DOIUrl":null,"url":null,"abstract":"Background: A key challenge in differential abundance analysis (DAA) of microbial sequencing data is that the counts for each sample are compositional, resulting in potentially biased comparisons of the absolute abundance across study groups. Normalization-based DAA methods rely on external normalization factors that account for compositionality by standardizing the counts onto a common numerical scale. However, existing normalization methods have struggled to maintain the false discovery rate in settings where the variance or compositional bias is large. This article proposes a novel framework for normalization that can reduce bias in DAA by re-conceptualizing normalization as a group-level task. We present two new normalization methods within the group-wise framework: group-wise relative log expression (G-RLE) and fold-truncated sum scaling (FTSS).Results: G-RLE and FTSS achieve higher statistical power for identifying differentially abundant taxa than existing methods in model-based and synthetic data simulation settings. The two novel methods also maintain the false discovery rate in challenging scenarios where existing methods suffer. The best results are obtained from using FTSS normalization with the DAA method MetagenomeSeq.Conclusion: Compared with other methods for normalizing compositional sequence count data prior to DAA, the proposed group-level normalization frameworks offer more robust statistical inference. With a solid mathematical foundation, validated performance in numerical studies, and publicly available software, these new methods can help improve rigor and reproducibility in microbiome research.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"196"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308967/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06235-9","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: A key challenge in differential abundance analysis (DAA) of microbial sequencing data is that the counts for each sample are compositional, resulting in potentially biased comparisons of the absolute abundance across study groups. Normalization-based DAA methods rely on external normalization factors that account for compositionality by standardizing the counts onto a common numerical scale. However, existing normalization methods have struggled to maintain the false discovery rate in settings where the variance or compositional bias is large. This article proposes a novel framework for normalization that can reduce bias in DAA by re-conceptualizing normalization as a group-level task. We present two new normalization methods within the group-wise framework: group-wise relative log expression (G-RLE) and fold-truncated sum scaling (FTSS).

Results: G-RLE and FTSS achieve higher statistical power for identifying differentially abundant taxa than existing methods in model-based and synthetic data simulation settings. The two novel methods also maintain the false discovery rate in challenging scenarios where existing methods suffer. The best results are obtained from using FTSS normalization with the DAA method MetagenomeSeq.

Conclusion: Compared with other methods for normalizing compositional sequence count data prior to DAA, the proposed group-level normalization frameworks offer more robust statistical inference. With a solid mathematical foundation, validated performance in numerical studies, and publicly available software, these new methods can help improve rigor and reproducibility in microbiome research.

查看原文本刊更多论文

微生物组样品差异丰度分析的分组归一化。

背景：微生物测序数据差异丰度分析（DAA）的一个关键挑战是每个样本的计数是组成的，导致研究组之间绝对丰度的比较可能存在偏差。基于归一化的DAA方法依赖于外部归一化因素，这些因素通过将计数标准化到一个通用的数值尺度来解释组合性。然而，现有的归一化方法很难在方差或成分偏差较大的情况下保持错误发现率。本文提出了一种新的规范化框架，通过将规范化重新定义为组级任务，可以减少DAA中的偏差。我们提出了两种新的归一化方法：组明智相对对数表达式（G-RLE）和折叠截断和缩放（FTSS）。结果：G-RLE和FTSS在基于模型和合成数据模拟的情况下，对差异丰度分类群的识别具有比现有方法更高的统计能力。这两种新方法还在现有方法受到影响的具有挑战性的情况下保持了错误发现率。采用DAA方法MetagenomeSeq对FTSS进行归一化，得到了最好的结果。结论：与DAA之前的其他组合序列计数数据归一化方法相比，所提出的组级归一化框架提供了更稳健的统计推断。这些新方法具有坚实的数学基础，在数值研究中的验证性能，以及公开可用的软件，可以帮助提高微生物组研究的严谨性和可重复性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.