Data processing solutions to render metabolomics more quantitative: case studies in food and clinical metabolomics using Metabox 2.0.

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience Pub Date : 2024-01-02 DOI:10.1093/gigascience/giae005

Kwanjeera Wanichthanarak, Ammarin In-On, Sili Fan, Oliver Fiehn, Arporn Wangwiwatsin, Sakda Khoomrung

{"title":"Data processing solutions to render metabolomics more quantitative: case studies in food and clinical metabolomics using Metabox 2.0.","authors":"Kwanjeera Wanichthanarak, Ammarin In-On, Sili Fan, Oliver Fiehn, Arporn Wangwiwatsin, Sakda Khoomrung","doi":"10.1093/gigascience/giae005","DOIUrl":null,"url":null,"abstract":"<p><p>In classic semiquantitative metabolomics, metabolite intensities are affected by biological factors and other unwanted variations. A systematic evaluation of the data processing methods is crucial to identify adequate processing procedures for a given experimental setup. Current comparative studies are mostly focused on peak area data but not on absolute concentrations. In this study, we evaluated data processing methods to produce outputs that were most similar to the corresponding absolute quantified data. We examined the data distribution characteristics, fold difference patterns between 2 metabolites, and sample variance. We used 2 metabolomic datasets from a retail milk study and a lupus nephritis cohort as test cases. When studying the impact of data normalization, transformation, scaling, and combinations of these methods, we found that the cross-contribution compensating multiple standard normalization (ccmn) method, followed by square root data transformation, was most appropriate for a well-controlled study such as the milk study dataset. Regarding the lupus nephritis cohort study, only ccmn normalization could slightly improve the data quality of the noisy cohort. Since the assessment accounted for the resemblance between processed data and the corresponding absolute quantified data, our results denote a helpful guideline for processing metabolomic datasets within a similar context (food and clinical metabolomics). Finally, we introduce Metabox 2.0, which enables thorough analysis of metabolomic data, including data processing, biomarker analysis, integrative analysis, and data interpretation. It was successfully used to process and analyze the data in this study. An online web version is available at http://metsysbio.com/metabox.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"13 ","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10941642/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae005","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

In classic semiquantitative metabolomics, metabolite intensities are affected by biological factors and other unwanted variations. A systematic evaluation of the data processing methods is crucial to identify adequate processing procedures for a given experimental setup. Current comparative studies are mostly focused on peak area data but not on absolute concentrations. In this study, we evaluated data processing methods to produce outputs that were most similar to the corresponding absolute quantified data. We examined the data distribution characteristics, fold difference patterns between 2 metabolites, and sample variance. We used 2 metabolomic datasets from a retail milk study and a lupus nephritis cohort as test cases. When studying the impact of data normalization, transformation, scaling, and combinations of these methods, we found that the cross-contribution compensating multiple standard normalization (ccmn) method, followed by square root data transformation, was most appropriate for a well-controlled study such as the milk study dataset. Regarding the lupus nephritis cohort study, only ccmn normalization could slightly improve the data quality of the noisy cohort. Since the assessment accounted for the resemblance between processed data and the corresponding absolute quantified data, our results denote a helpful guideline for processing metabolomic datasets within a similar context (food and clinical metabolomics). Finally, we introduce Metabox 2.0, which enables thorough analysis of metabolomic data, including data processing, biomarker analysis, integrative analysis, and data interpretation. It was successfully used to process and analyze the data in this study. An online web version is available at http://metsysbio.com/metabox.

查看原文本刊更多论文

使代谢组学更加定量化的数据处理解决方案：使用 Metabox 2.0 进行的食品和临床代谢组学案例研究。

在传统的半定量代谢组学研究中，代谢物强度会受到生物因素和其他不必要变化的影响。对数据处理方法进行系统评估对于确定特定实验设置的适当处理程序至关重要。目前的比较研究大多侧重于峰面积数据，而不是绝对浓度。在本研究中，我们评估了数据处理方法，以得出与相应绝对定量数据最相似的输出结果。我们考察了数据分布特征、两种代谢物之间的折差模式以及样本方差。我们使用了来自零售牛奶研究和狼疮肾炎队列的两个代谢组数据集作为测试案例。在研究数据归一化、转换、缩放和这些方法组合的影响时，我们发现交叉分布补偿多重标准归一化（ccmn）方法和平方根数据转换最适合牛奶研究数据集这样的控制良好的研究。至于狼疮性肾炎队列研究，只有 ccmn 归一化能稍微改善噪声队列的数据质量。由于评估考虑了处理后数据与相应绝对量化数据之间的相似性，我们的结果为在类似情况下（食品和临床代谢组学）处理代谢组学数据集提供了有益的指导。最后，我们介绍了 Metabox 2.0，它能对代谢组学数据进行全面分析，包括数据处理、生物标记分析、综合分析和数据解读。在本研究中，我们成功地使用了它来处理和分析数据。在线网络版可在 http://metsysbio.com/metabox 上查阅。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

GigaScience MULTIDISCIPLINARY SCIENCES-

CiteScore

15.50

自引率

1.10%

发文量

119

审稿时长

1 weeks

期刊介绍： GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.