A workflow for statistical analysis and visualization of microbiome omics data using the R microeco package.

IF 16 1区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Chi Liu, Felipe R P Mansoldo, Hankang Li, Alane Beatriz Vermelho, Raymond Jianxiong Zeng, Xiangzhen Li, Minjie Yao
{"title":"A workflow for statistical analysis and visualization of microbiome omics data using the R microeco package.","authors":"Chi Liu, Felipe R P Mansoldo, Hankang Li, Alane Beatriz Vermelho, Raymond Jianxiong Zeng, Xiangzhen Li, Minjie Yao","doi":"10.1038/s41596-025-01239-4","DOIUrl":null,"url":null,"abstract":"<p><p>The increasing complexity of experimental designs and the volume of data in the microbiome field, along with the diversification of omics data types, pose substantial challenges to statistical analysis and visualization. Here we present a step-by-step protocol based on the R microeco package ( https://github.com/ChiLiubio/microeco ) that details the statistical analysis and visualization of microbiome data. The omics data types shown consist of amplicon sequencing data, metagenomic sequencing data and nontargeted metabolomics data. The analysis of amplicon sequencing data specifically involves data preprocessing and normalization, core taxa, alpha diversity, beta diversity, differential abundance testing and machine learning. We consider various data analysis scenarios in each section to exhibit the comprehensiveness of the protocol. We emphasize that different normalized data produced by various methods are selected for subsequent analysis of each part based on the best analytical practices. Additionally, in the differential abundance test analysis, we adopt parametric community simulation to enable the performance evaluation of various testing approaches. For the analysis of metagenomic data, the focus is on how bioinformatic analysis data are read and preprocessed, which refers to the major usage differences from amplicon sequencing data. For metabolomics data, we mainly demonstrate the differential test, machine learning and association analysis with microbial abundances. To address some complex analyses, this protocol extensively combines different types of methods to build an analysis pipeline. This protocol is more comprehensive and scalable compared with alternative methods. The provided R codes can run in about 6 h on a laptop computer.</p>","PeriodicalId":18901,"journal":{"name":"Nature Protocols","volume":" ","pages":""},"PeriodicalIF":16.0000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Protocols","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41596-025-01239-4","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The increasing complexity of experimental designs and the volume of data in the microbiome field, along with the diversification of omics data types, pose substantial challenges to statistical analysis and visualization. Here we present a step-by-step protocol based on the R microeco package ( https://github.com/ChiLiubio/microeco ) that details the statistical analysis and visualization of microbiome data. The omics data types shown consist of amplicon sequencing data, metagenomic sequencing data and nontargeted metabolomics data. The analysis of amplicon sequencing data specifically involves data preprocessing and normalization, core taxa, alpha diversity, beta diversity, differential abundance testing and machine learning. We consider various data analysis scenarios in each section to exhibit the comprehensiveness of the protocol. We emphasize that different normalized data produced by various methods are selected for subsequent analysis of each part based on the best analytical practices. Additionally, in the differential abundance test analysis, we adopt parametric community simulation to enable the performance evaluation of various testing approaches. For the analysis of metagenomic data, the focus is on how bioinformatic analysis data are read and preprocessed, which refers to the major usage differences from amplicon sequencing data. For metabolomics data, we mainly demonstrate the differential test, machine learning and association analysis with microbial abundances. To address some complex analyses, this protocol extensively combines different types of methods to build an analysis pipeline. This protocol is more comprehensive and scalable compared with alternative methods. The provided R codes can run in about 6 h on a laptop computer.

使用R microeco包的微生物组学数据的统计分析和可视化工作流程。
微生物组学领域实验设计的复杂性和数据量的增加,以及组学数据类型的多样化,给统计分析和可视化带来了巨大的挑战。在这里,我们提出了一个基于R microeco包(https://github.com/ChiLiubio/microeco)的逐步协议,详细介绍了微生物组数据的统计分析和可视化。组学数据类型包括扩增子测序数据、宏基因组测序数据和非靶向代谢组学数据。扩增子测序数据的分析具体涉及数据预处理和归一化、核心分类群、α多样性、β多样性、差分丰度测试和机器学习。我们在每一部分中考虑各种数据分析场景,以展示协议的全面性。我们强调,根据最佳分析实践,选择各种方法产生的不同归一化数据进行每个部分的后续分析。此外,在差异丰度测试分析中,我们采用参数群落模拟来评估各种测试方法的性能。对于宏基因组数据的分析,重点是如何读取和预处理生物信息学分析数据,这是指与扩增子测序数据的主要使用差异。对于代谢组学数据,我们主要展示了差异测试、机器学习和与微生物丰度的关联分析。为了处理一些复杂的分析,该协议广泛地结合了不同类型的方法来构建分析管道。与其他方法相比,该协议更全面,可扩展性更强。提供的R代码可以在笔记本电脑上运行大约6小时。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature Protocols
Nature Protocols 生物-生化研究方法
CiteScore
29.10
自引率
0.70%
发文量
128
审稿时长
4 months
期刊介绍: Nature Protocols focuses on publishing protocols used to address significant biological and biomedical science research questions, including methods grounded in physics and chemistry with practical applications to biological problems. The journal caters to a primary audience of research scientists and, as such, exclusively publishes protocols with research applications. Protocols primarily aimed at influencing patient management and treatment decisions are not featured. The specific techniques covered encompass a wide range, including but not limited to: Biochemistry, Cell biology, Cell culture, Chemical modification, Computational biology, Developmental biology, Epigenomics, Genetic analysis, Genetic modification, Genomics, Imaging, Immunology, Isolation, purification, and separation, Lipidomics, Metabolomics, Microbiology, Model organisms, Nanotechnology, Neuroscience, Nucleic-acid-based molecular biology, Pharmacology, Plant biology, Protein analysis, Proteomics, Spectroscopy, Structural biology, Synthetic chemistry, Tissue culture, Toxicology, and Virology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信