基于质谱的多组学数据集的归一化策略评估。

IF 3.5 3区医学 Q2 ENDOCRINOLOGY & METABOLISM

Metabolomics Pub Date : 2025-07-01 DOI:10.1007/s11306-025-02297-1

Chi Yen Tseng, Jessica A Salguero, Joshua D Breidenbach, Emilia Solomon, Claire K Sanders, Tara Harvey, M Grace Thornhill, Salvator J Palmisano, Zachary J Sasiene, Brett R Blackwell, Ethan M McBride, Kes A Luchini, Erick S LeBrun, Marc Alvarez, Phillip M Mach, Emilio S Rivera, Trevor G Glaros

{"title":"基于质谱的多组学数据集的归一化策略评估。","authors":"Chi Yen Tseng, Jessica A Salguero, Joshua D Breidenbach, Emilia Solomon, Claire K Sanders, Tara Harvey, M Grace Thornhill, Salvator J Palmisano, Zachary J Sasiene, Brett R Blackwell, Ethan M McBride, Kes A Luchini, Erick S LeBrun, Marc Alvarez, Phillip M Mach, Emilio S Rivera, Trevor G Glaros","doi":"10.1007/s11306-025-02297-1","DOIUrl":null,"url":null,"abstract":"Introduction: Data normalization is crucial for multi-omics integration, reducing systematic errors and maximizing the likelihood of discovering true biological variation. Most studies assess normalization for a single omics type or use datasets from separate experiments. Few address time-course data, where normalization might bias temporal differentiation. In this study, we compared common normalization methods and a machine learning approach, Systematical Error Removal using Random Forest (SERRF), using multi-omics datasets generated from the same experiment-even from the same cell lysate.Objectives: To develop a straightforward process to assess normalization effects and identify the most robust methods across multi-omics datasets.Methods: We analyzed metabolomics, lipidomics, and proteomics datasets from primary human cardiomyocytes and motor neurons exposed to acetylcholine-active compounds over time. Normalization effectiveness was evaluated based on improvement in QC features consistency and observing the change in treatment and time-related variance.Results: Probabilistic Quotient Normalization (PQN) and Locally Estimated Scatterplot Smoothing (LOESS) QC were identified as optimal for metabolomics and lipidomics, while PQN, Median, and LOESS normalization excelled for proteomics. These methods consistently enhanced QC feature consistency in metabolomics and lipidomics, and preserved time-related variance or treatment-related variance in proteomics, demonstrating their effectiveness and robustness. SERRF normalization, applied only to metabolomics in this study, outperformed other methods in some datasets but inadvertently masked treatment-related variance in others.Conclusion: Our evaluation identified PQN and LoessQC as the top methods for metabolomics and lipidomics, and PQN, Median, and Loess normalization for proteomics, in multi-omics integration in a temporal study.","PeriodicalId":18506,"journal":{"name":"Metabolomics","volume":"21 4","pages":"98"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12214035/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluation of normalization strategies for mass spectrometry-based multi-omics datasets.\",\"authors\":\"Chi Yen Tseng, Jessica A Salguero, Joshua D Breidenbach, Emilia Solomon, Claire K Sanders, Tara Harvey, M Grace Thornhill, Salvator J Palmisano, Zachary J Sasiene, Brett R Blackwell, Ethan M McBride, Kes A Luchini, Erick S LeBrun, Marc Alvarez, Phillip M Mach, Emilio S Rivera, Trevor G Glaros\",\"doi\":\"10.1007/s11306-025-02297-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Introduction: Data normalization is crucial for multi-omics integration, reducing systematic errors and maximizing the likelihood of discovering true biological variation. Most studies assess normalization for a single omics type or use datasets from separate experiments. Few address time-course data, where normalization might bias temporal differentiation. In this study, we compared common normalization methods and a machine learning approach, Systematical Error Removal using Random Forest (SERRF), using multi-omics datasets generated from the same experiment-even from the same cell lysate.Objectives: To develop a straightforward process to assess normalization effects and identify the most robust methods across multi-omics datasets.Methods: We analyzed metabolomics, lipidomics, and proteomics datasets from primary human cardiomyocytes and motor neurons exposed to acetylcholine-active compounds over time. Normalization effectiveness was evaluated based on improvement in QC features consistency and observing the change in treatment and time-related variance.Results: Probabilistic Quotient Normalization (PQN) and Locally Estimated Scatterplot Smoothing (LOESS) QC were identified as optimal for metabolomics and lipidomics, while PQN, Median, and LOESS normalization excelled for proteomics. These methods consistently enhanced QC feature consistency in metabolomics and lipidomics, and preserved time-related variance or treatment-related variance in proteomics, demonstrating their effectiveness and robustness. SERRF normalization, applied only to metabolomics in this study, outperformed other methods in some datasets but inadvertently masked treatment-related variance in others.Conclusion: Our evaluation identified PQN and LoessQC as the top methods for metabolomics and lipidomics, and PQN, Median, and Loess normalization for proteomics, in multi-omics integration in a temporal study.\",\"PeriodicalId\":18506,\"journal\":{\"name\":\"Metabolomics\",\"volume\":\"21 4\",\"pages\":\"98\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12214035/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Metabolomics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s11306-025-02297-1\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENDOCRINOLOGY & METABOLISM\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabolomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11306-025-02297-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}

引用次数: 0

摘要

数据规范化对于多组学集成至关重要，可以减少系统误差，最大限度地提高发现真正生物变异的可能性。大多数研究评估单一组学类型的规范化或使用来自单独实验的数据集。很少涉及时间过程数据，其中归一化可能会使时间差异产生偏差。在这项研究中，我们比较了常见的归一化方法和机器学习方法，使用随机森林系统误差去除（SERRF），使用来自相同实验的多组学数据集，甚至来自相同的细胞裂解液。目的：开发一个直接的过程来评估标准化效果，并确定跨多组学数据集的最稳健的方法。方法：我们分析了长期暴露于乙酰胆碱活性化合物的原代人心肌细胞和运动神经元的代谢组学、脂质组学和蛋白质组学数据集。根据质量控制特征一致性的改善以及观察治疗和时间相关方差的变化来评估归一化效果。结果：概率商归一化（PQN）和局部估计散点图平滑（黄土）QC被确定为代谢组学和脂质组学的最佳方法，而PQN、Median和黄土归一化在蛋白质组学中表现出色。这些方法一致地增强了代谢组学和脂质组学中QC特征的一致性，并保留了蛋白质组学中时间相关方差或治疗相关方差，证明了它们的有效性和稳健性。SERRF归一化在本研究中仅应用于代谢组学，在一些数据集中优于其他方法，但无意中掩盖了其他治疗相关的方差。结论：我们的评估确定PQN和LoessQC是代谢组学和脂质组学的最佳方法，而PQN、Median和黄土归一化是蛋白质组学的最佳方法，在多组学整合的时间研究中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of normalization strategies for mass spectrometry-based multi-omics datasets.

Introduction: Data normalization is crucial for multi-omics integration, reducing systematic errors and maximizing the likelihood of discovering true biological variation. Most studies assess normalization for a single omics type or use datasets from separate experiments. Few address time-course data, where normalization might bias temporal differentiation. In this study, we compared common normalization methods and a machine learning approach, Systematical Error Removal using Random Forest (SERRF), using multi-omics datasets generated from the same experiment-even from the same cell lysate.

Objectives: To develop a straightforward process to assess normalization effects and identify the most robust methods across multi-omics datasets.

Methods: We analyzed metabolomics, lipidomics, and proteomics datasets from primary human cardiomyocytes and motor neurons exposed to acetylcholine-active compounds over time. Normalization effectiveness was evaluated based on improvement in QC features consistency and observing the change in treatment and time-related variance.

Results: Probabilistic Quotient Normalization (PQN) and Locally Estimated Scatterplot Smoothing (LOESS) QC were identified as optimal for metabolomics and lipidomics, while PQN, Median, and LOESS normalization excelled for proteomics. These methods consistently enhanced QC feature consistency in metabolomics and lipidomics, and preserved time-related variance or treatment-related variance in proteomics, demonstrating their effectiveness and robustness. SERRF normalization, applied only to metabolomics in this study, outperformed other methods in some datasets but inadvertently masked treatment-related variance in others.

Conclusion: Our evaluation identified PQN and LoessQC as the top methods for metabolomics and lipidomics, and PQN, Median, and Loess normalization for proteomics, in multi-omics integration in a temporal study.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Metabolomics 医学-内分泌学与代谢

CiteScore

6.60

自引率

2.80%

发文量

审稿时长

2 months

期刊介绍： Metabolomics publishes current research regarding the development of technology platforms for metabolomics. This includes, but is not limited to: metabolomic applications within man, including pre-clinical and clinical pharmacometabolomics for precision medicine metabolic profiling and fingerprinting metabolite target analysis metabolomic applications within animals, plants and microbes transcriptomics and proteomics in systems biology Metabolomics is an indispensable platform for researchers using new post-genomics approaches, to discover networks and interactions between metabolites, pharmaceuticals, SNPs, proteins and more. Its articles go beyond the genome and metabolome, by including original clinical study material together with big data from new emerging technologies.