metaGEENOME：在横断面和纵向研究中对微生物组数据进行差异丰度分析的集成框架。

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-07-21 DOI:10.1186/s12859-025-06217-x

Ahmed Abdelkader, Nur A Ferdous, Mohamed El-Hadidi, Tomasz Burzykowski, Mohamed Mysara

{"title":"metaGEENOME：在横断面和纵向研究中对微生物组数据进行差异丰度分析的集成框架。","authors":"Ahmed Abdelkader, Nur A Ferdous, Mohamed El-Hadidi, Tomasz Burzykowski, Mohamed Mysara","doi":"10.1186/s12859-025-06217-x","DOIUrl":null,"url":null,"abstract":"Background: Detecting biomarkers is a key objective in microbiome research, often done through 16S rRNA amplicon sequencing or shotgun metagenomic analysis. A critical step in this process is differential abundance (DA) analysis, which aims to pinpoint taxa whose abundance significantly differs between groups. However, DA analysis remains challenging due to high dimensionality, compositionality, sparsity, inter-taxa correlations, uneven abundance distributions, and missing values-all which hinder our ability to model the data accurately. Despite the availability of many DA tools, balancing high statistical power with effective false discovery rate (FDR) control remains a major limitation.Results: Here, we introduce a novel approach for DA analysis that integrates counts adjusted with Trimmed Mean of M-values (CTF) normalization and Centered Log Ratio (CLR) transformation with Generalized Estimating Equation (GEE) model. We benchmarked our approach against eight widely used tools employing both simulated and real datasets in cross-sectional and longitudinal settings. While several tools (e.g. MetagenomeSeq, edgeR, DESeq2 and Lefse) achieved high sensitivity, they often failed to adequately control the FDR. In contrast, our method demonstrated high sensitivity and specificity when compared to other approaches that successfully controlled the FDR, including ALDEx2, limma-voom, ANCOM, and ANCOM-BC2.Conclusions: Our approach effectively addresses key challenges in microbiome data analysis across both cross-sectional and longitudinal designs. Integrated into the R package metaGEENOME (https://github.com/M-Mysara/metaGEENOME), our framework provides a flexible, scalable and statistically robust solution for DA analysis, offering improved FDR control and enhanced performance for biomarker discovery in microbiome studies.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"189"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281747/pdf/","citationCount":"0","resultStr":"{\"title\":\"metaGEENOME: an integrated framework for differential abundance analysis of microbiome data in cross-sectional and longitudinal studies.\",\"authors\":\"Ahmed Abdelkader, Nur A Ferdous, Mohamed El-Hadidi, Tomasz Burzykowski, Mohamed Mysara\",\"doi\":\"10.1186/s12859-025-06217-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Detecting biomarkers is a key objective in microbiome research, often done through 16S rRNA amplicon sequencing or shotgun metagenomic analysis. A critical step in this process is differential abundance (DA) analysis, which aims to pinpoint taxa whose abundance significantly differs between groups. However, DA analysis remains challenging due to high dimensionality, compositionality, sparsity, inter-taxa correlations, uneven abundance distributions, and missing values-all which hinder our ability to model the data accurately. Despite the availability of many DA tools, balancing high statistical power with effective false discovery rate (FDR) control remains a major limitation.Results: Here, we introduce a novel approach for DA analysis that integrates counts adjusted with Trimmed Mean of M-values (CTF) normalization and Centered Log Ratio (CLR) transformation with Generalized Estimating Equation (GEE) model. We benchmarked our approach against eight widely used tools employing both simulated and real datasets in cross-sectional and longitudinal settings. While several tools (e.g. MetagenomeSeq, edgeR, DESeq2 and Lefse) achieved high sensitivity, they often failed to adequately control the FDR. In contrast, our method demonstrated high sensitivity and specificity when compared to other approaches that successfully controlled the FDR, including ALDEx2, limma-voom, ANCOM, and ANCOM-BC2.Conclusions: Our approach effectively addresses key challenges in microbiome data analysis across both cross-sectional and longitudinal designs. Integrated into the R package metaGEENOME (https://github.com/M-Mysara/metaGEENOME), our framework provides a flexible, scalable and statistically robust solution for DA analysis, offering improved FDR control and enhanced performance for biomarker discovery in microbiome studies.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"189\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281747/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06217-x\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06217-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：检测生物标志物是微生物组研究的一个关键目标，通常通过16S rRNA扩增子测序或霰弹枪宏基因组分析来完成。这一过程的关键步骤是差异丰度（DA）分析，其目的是确定丰度在不同类群之间存在显著差异的分类群。然而，由于高维性、组合性、稀疏性、类群间相关性、不均匀的丰度分布和缺失值，数据分析仍然具有挑战性，所有这些都阻碍了我们准确建模数据的能力。尽管有许多数据分析工具可用，但平衡高统计能力和有效的错误发现率（FDR）控制仍然是一个主要限制。结果：本文引入了一种新的数据分析方法，该方法将经修剪后的m值均值（CTF）归一化和中心对数比（CLR）转换的计数与广义估计方程（GEE）模型相结合。我们对八种广泛使用的工具进行了基准测试，这些工具在横断面和纵向设置中使用模拟和真实数据集。虽然一些工具（如MetagenomeSeq、edgeR、DESeq2和Lefse）达到了很高的灵敏度，但它们往往不能充分控制FDR。相比之下，与其他成功控制FDR的方法（包括ALDEx2、limma-voom、ANCOM和ANCOM- bc2）相比，我们的方法具有较高的灵敏度和特异性。结论：我们的方法有效地解决了横断面和纵向设计中微生物组数据分析的关键挑战。集成到R包metaGEENOME （https://github.com/M-Mysara/metaGEENOME）中，我们的框架为DA分析提供了灵活，可扩展和统计健壮的解决方案，为微生物组研究中的生物标志物发现提供了改进的FDR控制和增强的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

metaGEENOME: an integrated framework for differential abundance analysis of microbiome data in cross-sectional and longitudinal studies.

Background: Detecting biomarkers is a key objective in microbiome research, often done through 16S rRNA amplicon sequencing or shotgun metagenomic analysis. A critical step in this process is differential abundance (DA) analysis, which aims to pinpoint taxa whose abundance significantly differs between groups. However, DA analysis remains challenging due to high dimensionality, compositionality, sparsity, inter-taxa correlations, uneven abundance distributions, and missing values-all which hinder our ability to model the data accurately. Despite the availability of many DA tools, balancing high statistical power with effective false discovery rate (FDR) control remains a major limitation.

Results: Here, we introduce a novel approach for DA analysis that integrates counts adjusted with Trimmed Mean of M-values (CTF) normalization and Centered Log Ratio (CLR) transformation with Generalized Estimating Equation (GEE) model. We benchmarked our approach against eight widely used tools employing both simulated and real datasets in cross-sectional and longitudinal settings. While several tools (e.g. MetagenomeSeq, edgeR, DESeq2 and Lefse) achieved high sensitivity, they often failed to adequately control the FDR. In contrast, our method demonstrated high sensitivity and specificity when compared to other approaches that successfully controlled the FDR, including ALDEx2, limma-voom, ANCOM, and ANCOM-BC2.

Conclusions: Our approach effectively addresses key challenges in microbiome data analysis across both cross-sectional and longitudinal designs. Integrated into the R package metaGEENOME (https://github.com/M-Mysara/metaGEENOME), our framework provides a flexible, scalable and statistically robust solution for DA analysis, offering improved FDR control and enhanced performance for biomarker discovery in microbiome studies.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.