Group-wise normalization in differential abundance analysis of microbiome samples.

IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Dylan Clark-Boucher, Brent A Coull, Harrison T Reeder, Fenglei Wang, Qi Sun, Jacqueline R Starr, Kyu Ha Lee
{"title":"Group-wise normalization in differential abundance analysis of microbiome samples.","authors":"Dylan Clark-Boucher, Brent A Coull, Harrison T Reeder, Fenglei Wang, Qi Sun, Jacqueline R Starr, Kyu Ha Lee","doi":"10.1186/s12859-025-06235-9","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>A key challenge in differential abundance analysis (DAA) of microbial sequencing data is that the counts for each sample are compositional, resulting in potentially biased comparisons of the absolute abundance across study groups. Normalization-based DAA methods rely on external normalization factors that account for compositionality by standardizing the counts onto a common numerical scale. However, existing normalization methods have struggled to maintain the false discovery rate in settings where the variance or compositional bias is large. This article proposes a novel framework for normalization that can reduce bias in DAA by re-conceptualizing normalization as a group-level task. We present two new normalization methods within the group-wise framework: group-wise relative log expression (G-RLE) and fold-truncated sum scaling (FTSS).</p><p><strong>Results: </strong>G-RLE and FTSS achieve higher statistical power for identifying differentially abundant taxa than existing methods in model-based and synthetic data simulation settings. The two novel methods also maintain the false discovery rate in challenging scenarios where existing methods suffer. The best results are obtained from using FTSS normalization with the DAA method MetagenomeSeq.</p><p><strong>Conclusion: </strong>Compared with other methods for normalizing compositional sequence count data prior to DAA, the proposed group-level normalization frameworks offer more robust statistical inference. With a solid mathematical foundation, validated performance in numerical studies, and publicly available software, these new methods can help improve rigor and reproducibility in microbiome research.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"196"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308967/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06235-9","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: A key challenge in differential abundance analysis (DAA) of microbial sequencing data is that the counts for each sample are compositional, resulting in potentially biased comparisons of the absolute abundance across study groups. Normalization-based DAA methods rely on external normalization factors that account for compositionality by standardizing the counts onto a common numerical scale. However, existing normalization methods have struggled to maintain the false discovery rate in settings where the variance or compositional bias is large. This article proposes a novel framework for normalization that can reduce bias in DAA by re-conceptualizing normalization as a group-level task. We present two new normalization methods within the group-wise framework: group-wise relative log expression (G-RLE) and fold-truncated sum scaling (FTSS).

Results: G-RLE and FTSS achieve higher statistical power for identifying differentially abundant taxa than existing methods in model-based and synthetic data simulation settings. The two novel methods also maintain the false discovery rate in challenging scenarios where existing methods suffer. The best results are obtained from using FTSS normalization with the DAA method MetagenomeSeq.

Conclusion: Compared with other methods for normalizing compositional sequence count data prior to DAA, the proposed group-level normalization frameworks offer more robust statistical inference. With a solid mathematical foundation, validated performance in numerical studies, and publicly available software, these new methods can help improve rigor and reproducibility in microbiome research.

微生物组样品差异丰度分析的分组归一化。
背景:微生物测序数据差异丰度分析(DAA)的一个关键挑战是每个样本的计数是组成的,导致研究组之间绝对丰度的比较可能存在偏差。基于归一化的DAA方法依赖于外部归一化因素,这些因素通过将计数标准化到一个通用的数值尺度来解释组合性。然而,现有的归一化方法很难在方差或成分偏差较大的情况下保持错误发现率。本文提出了一种新的规范化框架,通过将规范化重新定义为组级任务,可以减少DAA中的偏差。我们提出了两种新的归一化方法:组明智相对对数表达式(G-RLE)和折叠截断和缩放(FTSS)。结果:G-RLE和FTSS在基于模型和合成数据模拟的情况下,对差异丰度分类群的识别具有比现有方法更高的统计能力。这两种新方法还在现有方法受到影响的具有挑战性的情况下保持了错误发现率。采用DAA方法MetagenomeSeq对FTSS进行归一化,得到了最好的结果。结论:与DAA之前的其他组合序列计数数据归一化方法相比,所提出的组级归一化框架提供了更稳健的统计推断。这些新方法具有坚实的数学基础,在数值研究中的验证性能,以及公开可用的软件,可以帮助提高微生物组研究的严谨性和可重复性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信