Evaluation of normalization strategies for mass spectrometry-based multi-omics datasets.

IF 3.5 3区 医学 Q2 ENDOCRINOLOGY & METABOLISM
Chi Yen Tseng, Jessica A Salguero, Joshua D Breidenbach, Emilia Solomon, Claire K Sanders, Tara Harvey, M Grace Thornhill, Salvator J Palmisano, Zachary J Sasiene, Brett R Blackwell, Ethan M McBride, Kes A Luchini, Erick S LeBrun, Marc Alvarez, Phillip M Mach, Emilio S Rivera, Trevor G Glaros
{"title":"Evaluation of normalization strategies for mass spectrometry-based multi-omics datasets.","authors":"Chi Yen Tseng, Jessica A Salguero, Joshua D Breidenbach, Emilia Solomon, Claire K Sanders, Tara Harvey, M Grace Thornhill, Salvator J Palmisano, Zachary J Sasiene, Brett R Blackwell, Ethan M McBride, Kes A Luchini, Erick S LeBrun, Marc Alvarez, Phillip M Mach, Emilio S Rivera, Trevor G Glaros","doi":"10.1007/s11306-025-02297-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Data normalization is crucial for multi-omics integration, reducing systematic errors and maximizing the likelihood of discovering true biological variation. Most studies assess normalization for a single omics type or use datasets from separate experiments. Few address time-course data, where normalization might bias temporal differentiation. In this study, we compared common normalization methods and a machine learning approach, Systematical Error Removal using Random Forest (SERRF), using multi-omics datasets generated from the same experiment-even from the same cell lysate.</p><p><strong>Objectives: </strong>To develop a straightforward process to assess normalization effects and identify the most robust methods across multi-omics datasets.</p><p><strong>Methods: </strong>We analyzed metabolomics, lipidomics, and proteomics datasets from primary human cardiomyocytes and motor neurons exposed to acetylcholine-active compounds over time. Normalization effectiveness was evaluated based on improvement in QC features consistency and observing the change in treatment and time-related variance.</p><p><strong>Results: </strong>Probabilistic Quotient Normalization (PQN) and Locally Estimated Scatterplot Smoothing (LOESS) QC were identified as optimal for metabolomics and lipidomics, while PQN, Median, and LOESS normalization excelled for proteomics. These methods consistently enhanced QC feature consistency in metabolomics and lipidomics, and preserved time-related variance or treatment-related variance in proteomics, demonstrating their effectiveness and robustness. SERRF normalization, applied only to metabolomics in this study, outperformed other methods in some datasets but inadvertently masked treatment-related variance in others.</p><p><strong>Conclusion: </strong>Our evaluation identified PQN and LoessQC as the top methods for metabolomics and lipidomics, and PQN, Median, and Loess normalization for proteomics, in multi-omics integration in a temporal study.</p>","PeriodicalId":18506,"journal":{"name":"Metabolomics","volume":"21 4","pages":"98"},"PeriodicalIF":3.5000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12214035/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabolomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11306-025-02297-1","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Data normalization is crucial for multi-omics integration, reducing systematic errors and maximizing the likelihood of discovering true biological variation. Most studies assess normalization for a single omics type or use datasets from separate experiments. Few address time-course data, where normalization might bias temporal differentiation. In this study, we compared common normalization methods and a machine learning approach, Systematical Error Removal using Random Forest (SERRF), using multi-omics datasets generated from the same experiment-even from the same cell lysate.

Objectives: To develop a straightforward process to assess normalization effects and identify the most robust methods across multi-omics datasets.

Methods: We analyzed metabolomics, lipidomics, and proteomics datasets from primary human cardiomyocytes and motor neurons exposed to acetylcholine-active compounds over time. Normalization effectiveness was evaluated based on improvement in QC features consistency and observing the change in treatment and time-related variance.

Results: Probabilistic Quotient Normalization (PQN) and Locally Estimated Scatterplot Smoothing (LOESS) QC were identified as optimal for metabolomics and lipidomics, while PQN, Median, and LOESS normalization excelled for proteomics. These methods consistently enhanced QC feature consistency in metabolomics and lipidomics, and preserved time-related variance or treatment-related variance in proteomics, demonstrating their effectiveness and robustness. SERRF normalization, applied only to metabolomics in this study, outperformed other methods in some datasets but inadvertently masked treatment-related variance in others.

Conclusion: Our evaluation identified PQN and LoessQC as the top methods for metabolomics and lipidomics, and PQN, Median, and Loess normalization for proteomics, in multi-omics integration in a temporal study.

基于质谱的多组学数据集的归一化策略评估。
数据规范化对于多组学集成至关重要,可以减少系统误差,最大限度地提高发现真正生物变异的可能性。大多数研究评估单一组学类型的规范化或使用来自单独实验的数据集。很少涉及时间过程数据,其中归一化可能会使时间差异产生偏差。在这项研究中,我们比较了常见的归一化方法和机器学习方法,使用随机森林系统误差去除(SERRF),使用来自相同实验的多组学数据集,甚至来自相同的细胞裂解液。目的:开发一个直接的过程来评估标准化效果,并确定跨多组学数据集的最稳健的方法。方法:我们分析了长期暴露于乙酰胆碱活性化合物的原代人心肌细胞和运动神经元的代谢组学、脂质组学和蛋白质组学数据集。根据质量控制特征一致性的改善以及观察治疗和时间相关方差的变化来评估归一化效果。结果:概率商归一化(PQN)和局部估计散点图平滑(黄土)QC被确定为代谢组学和脂质组学的最佳方法,而PQN、Median和黄土归一化在蛋白质组学中表现出色。这些方法一致地增强了代谢组学和脂质组学中QC特征的一致性,并保留了蛋白质组学中时间相关方差或治疗相关方差,证明了它们的有效性和稳健性。SERRF归一化在本研究中仅应用于代谢组学,在一些数据集中优于其他方法,但无意中掩盖了其他治疗相关的方差。结论:我们的评估确定PQN和LoessQC是代谢组学和脂质组学的最佳方法,而PQN、Median和黄土归一化是蛋白质组学的最佳方法,在多组学整合的时间研究中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Metabolomics
Metabolomics 医学-内分泌学与代谢
CiteScore
6.60
自引率
2.80%
发文量
84
审稿时长
2 months
期刊介绍: Metabolomics publishes current research regarding the development of technology platforms for metabolomics. This includes, but is not limited to: metabolomic applications within man, including pre-clinical and clinical pharmacometabolomics for precision medicine metabolic profiling and fingerprinting metabolite target analysis metabolomic applications within animals, plants and microbes transcriptomics and proteomics in systems biology Metabolomics is an indispensable platform for researchers using new post-genomics approaches, to discover networks and interactions between metabolites, pharmaceuticals, SNPs, proteins and more. Its articles go beyond the genome and metabolome, by including original clinical study material together with big data from new emerging technologies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信