在代谢组学数据集中选择稳健的方法学习预测性生物标志物。

IF 6.7 1区化学 Q1 CHEMISTRY, ANALYTICAL

Analytical Chemistry Pub Date : 2025-06-12 DOI:10.1021/acs.analchem.5c01049

Thibaud Godon, Pier-Luc Plante, Jacques Corbeil, Pascal Germain, Alexandre Drouin

{"title":"在代谢组学数据集中选择稳健的方法学习预测性生物标志物。","authors":"Thibaud Godon, Pier-Luc Plante, Jacques Corbeil, Pascal Germain, Alexandre Drouin","doi":"10.1021/acs.analchem.5c01049","DOIUrl":null,"url":null,"abstract":"Metabolomics, the study of small molecules within biological systems, offers insights into metabolic processes and, consequently, holds great promise for advancing health outcomes. Biomarker discovery in metabolomics represents a significant challenge, notably due to the high dimensionality of the data. Recent work has addressed this problem by analyzing the most important variables in machine learning models. Unfortunately, this approach relies on prior hypotheses about the structure of the data and may overlook simple patterns. To assess the true usefulness of machine learning methods, we evaluate them on a collection of 835 metabolomics data sets. This effort provides valuable insights for metabolomics researchers regarding where and when to use machine learning. It also establishes a benchmark for the evaluation of future methods. Nonetheless, the results emphasize the high diversity of data sets in metabolomics and the complexity of finding biologically relevant biomarkers. As a result, we propose a novel approach applicable across all data sets, offering guidance for future analyses. This method involves directly comparing univariate and multivariate models. We demonstrate through selected examples how this approach can guide data analysis across diverse data set structures, representative of the observed variability. Code and data are available for research purposes.","PeriodicalId":27,"journal":{"name":"Analytical Chemistry","volume":" ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On Selecting Robust Approaches for Learning Predictive Biomarkers in Metabolomics Data Sets.\",\"authors\":\"Thibaud Godon, Pier-Luc Plante, Jacques Corbeil, Pascal Germain, Alexandre Drouin\",\"doi\":\"10.1021/acs.analchem.5c01049\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Metabolomics, the study of small molecules within biological systems, offers insights into metabolic processes and, consequently, holds great promise for advancing health outcomes. Biomarker discovery in metabolomics represents a significant challenge, notably due to the high dimensionality of the data. Recent work has addressed this problem by analyzing the most important variables in machine learning models. Unfortunately, this approach relies on prior hypotheses about the structure of the data and may overlook simple patterns. To assess the true usefulness of machine learning methods, we evaluate them on a collection of 835 metabolomics data sets. This effort provides valuable insights for metabolomics researchers regarding where and when to use machine learning. It also establishes a benchmark for the evaluation of future methods. Nonetheless, the results emphasize the high diversity of data sets in metabolomics and the complexity of finding biologically relevant biomarkers. As a result, we propose a novel approach applicable across all data sets, offering guidance for future analyses. This method involves directly comparing univariate and multivariate models. We demonstrate through selected examples how this approach can guide data analysis across diverse data set structures, representative of the observed variability. Code and data are available for research purposes.\",\"PeriodicalId\":27,\"journal\":{\"name\":\"Analytical Chemistry\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.7000,\"publicationDate\":\"2025-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Analytical Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.analchem.5c01049\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Chemistry","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.analchem.5c01049","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

摘要

代谢组学是对生物系统内小分子的研究，提供了对代谢过程的见解，因此对促进健康结果具有很大的希望。代谢组学中生物标志物的发现是一项重大挑战，特别是由于数据的高维性。最近的研究通过分析机器学习模型中最重要的变量来解决这个问题。不幸的是，这种方法依赖于先前关于数据结构的假设，可能会忽略简单的模式。为了评估机器学习方法的真正有用性，我们对835个代谢组学数据集进行了评估。这项工作为代谢组学研究人员提供了关于何时何地使用机器学习的宝贵见解。它还为评价未来的方法建立了一个基准。尽管如此，研究结果强调了代谢组学数据集的高度多样性和寻找生物学相关生物标志物的复杂性。因此，我们提出了一种适用于所有数据集的新方法，为未来的分析提供指导。这种方法包括直接比较单变量模型和多变量模型。我们通过选定的示例演示了这种方法如何指导跨不同数据集结构的数据分析，代表了观察到的可变性。代码和数据可用于研究目的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

On Selecting Robust Approaches for Learning Predictive Biomarkers in Metabolomics Data Sets.

查看原文本刊更多论文

On Selecting Robust Approaches for Learning Predictive Biomarkers in Metabolomics Data Sets.

Metabolomics, the study of small molecules within biological systems, offers insights into metabolic processes and, consequently, holds great promise for advancing health outcomes. Biomarker discovery in metabolomics represents a significant challenge, notably due to the high dimensionality of the data. Recent work has addressed this problem by analyzing the most important variables in machine learning models. Unfortunately, this approach relies on prior hypotheses about the structure of the data and may overlook simple patterns. To assess the true usefulness of machine learning methods, we evaluate them on a collection of 835 metabolomics data sets. This effort provides valuable insights for metabolomics researchers regarding where and when to use machine learning. It also establishes a benchmark for the evaluation of future methods. Nonetheless, the results emphasize the high diversity of data sets in metabolomics and the complexity of finding biologically relevant biomarkers. As a result, we propose a novel approach applicable across all data sets, offering guidance for future analyses. This method involves directly comparing univariate and multivariate models. We demonstrate through selected examples how this approach can guide data analysis across diverse data set structures, representative of the observed variability. Code and data are available for research purposes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Analytical Chemistry 化学-分析化学

CiteScore

12.10

自引率

12.20%

发文量

1949

审稿时长

1.4 months

期刊介绍： Analytical Chemistry, a peer-reviewed research journal, focuses on disseminating new and original knowledge across all branches of analytical chemistry. Fundamental articles may explore general principles of chemical measurement science and need not directly address existing or potential analytical methodology. They can be entirely theoretical or report experimental results. Contributions may cover various phases of analytical operations, including sampling, bioanalysis, electrochemistry, mass spectrometry, microscale and nanoscale systems, environmental analysis, separations, spectroscopy, chemical reactions and selectivity, instrumentation, imaging, surface analysis, and data processing. Papers discussing known analytical methods should present a significant, original application of the method, a notable improvement, or results on an important analyte.