A workflow for statistical analysis and visualization of microbiome omics data using the R microeco package.

IF 16 1区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Nature Protocols Pub Date : 2025-08-06 DOI:10.1038/s41596-025-01239-4

Chi Liu, Felipe R P Mansoldo, Hankang Li, Alane Beatriz Vermelho, Raymond Jianxiong Zeng, Xiangzhen Li, Minjie Yao

{"title":"A workflow for statistical analysis and visualization of microbiome omics data using the R microeco package.","authors":"Chi Liu, Felipe R P Mansoldo, Hankang Li, Alane Beatriz Vermelho, Raymond Jianxiong Zeng, Xiangzhen Li, Minjie Yao","doi":"10.1038/s41596-025-01239-4","DOIUrl":null,"url":null,"abstract":"<p><p>The increasing complexity of experimental designs and the volume of data in the microbiome field, along with the diversification of omics data types, pose substantial challenges to statistical analysis and visualization. Here we present a step-by-step protocol based on the R microeco package ( https://github.com/ChiLiubio/microeco ) that details the statistical analysis and visualization of microbiome data. The omics data types shown consist of amplicon sequencing data, metagenomic sequencing data and nontargeted metabolomics data. The analysis of amplicon sequencing data specifically involves data preprocessing and normalization, core taxa, alpha diversity, beta diversity, differential abundance testing and machine learning. We consider various data analysis scenarios in each section to exhibit the comprehensiveness of the protocol. We emphasize that different normalized data produced by various methods are selected for subsequent analysis of each part based on the best analytical practices. Additionally, in the differential abundance test analysis, we adopt parametric community simulation to enable the performance evaluation of various testing approaches. For the analysis of metagenomic data, the focus is on how bioinformatic analysis data are read and preprocessed, which refers to the major usage differences from amplicon sequencing data. For metabolomics data, we mainly demonstrate the differential test, machine learning and association analysis with microbial abundances. To address some complex analyses, this protocol extensively combines different types of methods to build an analysis pipeline. This protocol is more comprehensive and scalable compared with alternative methods. The provided R codes can run in about 6 h on a laptop computer.</p>","PeriodicalId":18901,"journal":{"name":"Nature Protocols","volume":" ","pages":""},"PeriodicalIF":16.0000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Protocols","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41596-025-01239-4","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The increasing complexity of experimental designs and the volume of data in the microbiome field, along with the diversification of omics data types, pose substantial challenges to statistical analysis and visualization. Here we present a step-by-step protocol based on the R microeco package ( https://github.com/ChiLiubio/microeco ) that details the statistical analysis and visualization of microbiome data. The omics data types shown consist of amplicon sequencing data, metagenomic sequencing data and nontargeted metabolomics data. The analysis of amplicon sequencing data specifically involves data preprocessing and normalization, core taxa, alpha diversity, beta diversity, differential abundance testing and machine learning. We consider various data analysis scenarios in each section to exhibit the comprehensiveness of the protocol. We emphasize that different normalized data produced by various methods are selected for subsequent analysis of each part based on the best analytical practices. Additionally, in the differential abundance test analysis, we adopt parametric community simulation to enable the performance evaluation of various testing approaches. For the analysis of metagenomic data, the focus is on how bioinformatic analysis data are read and preprocessed, which refers to the major usage differences from amplicon sequencing data. For metabolomics data, we mainly demonstrate the differential test, machine learning and association analysis with microbial abundances. To address some complex analyses, this protocol extensively combines different types of methods to build an analysis pipeline. This protocol is more comprehensive and scalable compared with alternative methods. The provided R codes can run in about 6 h on a laptop computer.

查看原文本刊更多论文

使用R microeco包的微生物组学数据的统计分析和可视化工作流程。

微生物组学领域实验设计的复杂性和数据量的增加，以及组学数据类型的多样化，给统计分析和可视化带来了巨大的挑战。在这里，我们提出了一个基于R microeco包（https://github.com/ChiLiubio/microeco）的逐步协议，详细介绍了微生物组数据的统计分析和可视化。组学数据类型包括扩增子测序数据、宏基因组测序数据和非靶向代谢组学数据。扩增子测序数据的分析具体涉及数据预处理和归一化、核心分类群、α多样性、β多样性、差分丰度测试和机器学习。我们在每一部分中考虑各种数据分析场景，以展示协议的全面性。我们强调，根据最佳分析实践，选择各种方法产生的不同归一化数据进行每个部分的后续分析。此外，在差异丰度测试分析中，我们采用参数群落模拟来评估各种测试方法的性能。对于宏基因组数据的分析，重点是如何读取和预处理生物信息学分析数据，这是指与扩增子测序数据的主要使用差异。对于代谢组学数据，我们主要展示了差异测试、机器学习和与微生物丰度的关联分析。为了处理一些复杂的分析，该协议广泛地结合了不同类型的方法来构建分析管道。与其他方法相比，该协议更全面，可扩展性更强。提供的R代码可以在笔记本电脑上运行大约6小时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nature Protocols 生物-生化研究方法

CiteScore

29.10

自引率

0.70%

发文量

128

审稿时长

4 months

期刊介绍： Nature Protocols focuses on publishing protocols used to address significant biological and biomedical science research questions, including methods grounded in physics and chemistry with practical applications to biological problems. The journal caters to a primary audience of research scientists and, as such, exclusively publishes protocols with research applications. Protocols primarily aimed at influencing patient management and treatment decisions are not featured. The specific techniques covered encompass a wide range, including but not limited to: Biochemistry, Cell biology, Cell culture, Chemical modification, Computational biology, Developmental biology, Epigenomics, Genetic analysis, Genetic modification, Genomics, Imaging, Immunology, Isolation, purification, and separation, Lipidomics, Metabolomics, Microbiology, Model organisms, Nanotechnology, Neuroscience, Nucleic-acid-based molecular biology, Pharmacology, Plant biology, Protein analysis, Proteomics, Spectroscopy, Structural biology, Synthetic chemistry, Tissue culture, Toxicology, and Virology.