Ahmed Abdelkader, Nur A Ferdous, Mohamed El-Hadidi, Tomasz Burzykowski, Mohamed Mysara
{"title":"metaGEENOME:在横断面和纵向研究中对微生物组数据进行差异丰度分析的集成框架。","authors":"Ahmed Abdelkader, Nur A Ferdous, Mohamed El-Hadidi, Tomasz Burzykowski, Mohamed Mysara","doi":"10.1186/s12859-025-06217-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Detecting biomarkers is a key objective in microbiome research, often done through 16S rRNA amplicon sequencing or shotgun metagenomic analysis. A critical step in this process is differential abundance (DA) analysis, which aims to pinpoint taxa whose abundance significantly differs between groups. However, DA analysis remains challenging due to high dimensionality, compositionality, sparsity, inter-taxa correlations, uneven abundance distributions, and missing values-all which hinder our ability to model the data accurately. Despite the availability of many DA tools, balancing high statistical power with effective false discovery rate (FDR) control remains a major limitation.</p><p><strong>Results: </strong>Here, we introduce a novel approach for DA analysis that integrates counts adjusted with Trimmed Mean of M-values (CTF) normalization and Centered Log Ratio (CLR) transformation with Generalized Estimating Equation (GEE) model. We benchmarked our approach against eight widely used tools employing both simulated and real datasets in cross-sectional and longitudinal settings. While several tools (e.g. MetagenomeSeq, edgeR, DESeq2 and Lefse) achieved high sensitivity, they often failed to adequately control the FDR. In contrast, our method demonstrated high sensitivity and specificity when compared to other approaches that successfully controlled the FDR, including ALDEx2, limma-voom, ANCOM, and ANCOM-BC2.</p><p><strong>Conclusions: </strong>Our approach effectively addresses key challenges in microbiome data analysis across both cross-sectional and longitudinal designs. Integrated into the R package metaGEENOME (https://github.com/M-Mysara/metaGEENOME), our framework provides a flexible, scalable and statistically robust solution for DA analysis, offering improved FDR control and enhanced performance for biomarker discovery in microbiome studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"189"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281747/pdf/","citationCount":"0","resultStr":"{\"title\":\"metaGEENOME: an integrated framework for differential abundance analysis of microbiome data in cross-sectional and longitudinal studies.\",\"authors\":\"Ahmed Abdelkader, Nur A Ferdous, Mohamed El-Hadidi, Tomasz Burzykowski, Mohamed Mysara\",\"doi\":\"10.1186/s12859-025-06217-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Detecting biomarkers is a key objective in microbiome research, often done through 16S rRNA amplicon sequencing or shotgun metagenomic analysis. A critical step in this process is differential abundance (DA) analysis, which aims to pinpoint taxa whose abundance significantly differs between groups. However, DA analysis remains challenging due to high dimensionality, compositionality, sparsity, inter-taxa correlations, uneven abundance distributions, and missing values-all which hinder our ability to model the data accurately. Despite the availability of many DA tools, balancing high statistical power with effective false discovery rate (FDR) control remains a major limitation.</p><p><strong>Results: </strong>Here, we introduce a novel approach for DA analysis that integrates counts adjusted with Trimmed Mean of M-values (CTF) normalization and Centered Log Ratio (CLR) transformation with Generalized Estimating Equation (GEE) model. We benchmarked our approach against eight widely used tools employing both simulated and real datasets in cross-sectional and longitudinal settings. While several tools (e.g. MetagenomeSeq, edgeR, DESeq2 and Lefse) achieved high sensitivity, they often failed to adequately control the FDR. In contrast, our method demonstrated high sensitivity and specificity when compared to other approaches that successfully controlled the FDR, including ALDEx2, limma-voom, ANCOM, and ANCOM-BC2.</p><p><strong>Conclusions: </strong>Our approach effectively addresses key challenges in microbiome data analysis across both cross-sectional and longitudinal designs. Integrated into the R package metaGEENOME (https://github.com/M-Mysara/metaGEENOME), our framework provides a flexible, scalable and statistically robust solution for DA analysis, offering improved FDR control and enhanced performance for biomarker discovery in microbiome studies.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"189\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281747/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06217-x\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06217-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
metaGEENOME: an integrated framework for differential abundance analysis of microbiome data in cross-sectional and longitudinal studies.
Background: Detecting biomarkers is a key objective in microbiome research, often done through 16S rRNA amplicon sequencing or shotgun metagenomic analysis. A critical step in this process is differential abundance (DA) analysis, which aims to pinpoint taxa whose abundance significantly differs between groups. However, DA analysis remains challenging due to high dimensionality, compositionality, sparsity, inter-taxa correlations, uneven abundance distributions, and missing values-all which hinder our ability to model the data accurately. Despite the availability of many DA tools, balancing high statistical power with effective false discovery rate (FDR) control remains a major limitation.
Results: Here, we introduce a novel approach for DA analysis that integrates counts adjusted with Trimmed Mean of M-values (CTF) normalization and Centered Log Ratio (CLR) transformation with Generalized Estimating Equation (GEE) model. We benchmarked our approach against eight widely used tools employing both simulated and real datasets in cross-sectional and longitudinal settings. While several tools (e.g. MetagenomeSeq, edgeR, DESeq2 and Lefse) achieved high sensitivity, they often failed to adequately control the FDR. In contrast, our method demonstrated high sensitivity and specificity when compared to other approaches that successfully controlled the FDR, including ALDEx2, limma-voom, ANCOM, and ANCOM-BC2.
Conclusions: Our approach effectively addresses key challenges in microbiome data analysis across both cross-sectional and longitudinal designs. Integrated into the R package metaGEENOME (https://github.com/M-Mysara/metaGEENOME), our framework provides a flexible, scalable and statistically robust solution for DA analysis, offering improved FDR control and enhanced performance for biomarker discovery in microbiome studies.
期刊介绍:
BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology.
BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.