A machine learning framework for classifying lipids in untargeted metabolomics using mass-to-charge ratios and retention times.

IF 3.3 3区 医学 Q2 ENDOCRINOLOGY & METABOLISM
Christelle Colin-Leitzinger, Yonatan Ayalew Mekonnen, Isis Narvaez-Bandera, Vanessa Y Rubio, Dalia Ercan, Eric A Welsh, Lancia N F Darville, Min Liu, Hayley D Ackerman, Julian Avila-Pacheco, Clary B Clish, Kevin Hicks, John M Koomen, Nancy Gillis, Brooke L Fridley, Elsa R Flores, Oana A Zeleznik, Paul A Stewart
{"title":"A machine learning framework for classifying lipids in untargeted metabolomics using mass-to-charge ratios and retention times.","authors":"Christelle Colin-Leitzinger, Yonatan Ayalew Mekonnen, Isis Narvaez-Bandera, Vanessa Y Rubio, Dalia Ercan, Eric A Welsh, Lancia N F Darville, Min Liu, Hayley D Ackerman, Julian Avila-Pacheco, Clary B Clish, Kevin Hicks, John M Koomen, Nancy Gillis, Brooke L Fridley, Elsa R Flores, Oana A Zeleznik, Paul A Stewart","doi":"10.1007/s11306-025-02343-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The identification of unknown metabolites remains a major challenge in untargeted metabolomics using liquid chromatography-mass spectrometry (LC-MS). This process typically depends on comparing mass spectral or chromatographic data to reference databases or deciphering complex fragmentation in tandem mass spectra. While current machine learning methods can predict metabolite structures using MS/MS (MS2) data, no approaches, to our knowledge, use only mass-to-charge ratio (m/z) and retention time (RT) from LC-MS data.</p><p><strong>Objective: </strong>To explore the potential of using the mass-to-charge ratio (m/z) and retention time (RT) from LC-MS data as standalone predictors for metabolite classification and propose a modeling framework which can be implemented internally on standalone datasets.</p><p><strong>Methods: </strong>We trained machine learning models on 20 mouse lung adenocarcinoma tumor samples with 7,353 features and validated them on a dataset of 81 samples with 22,000 features. A total of 120 combination of preprocessors and models were assessed. Features were classified as \"lipid\" or \"non-lipid\" based on the Human Metabolome Database (HMDB) taxonomy, and model performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), and area under the precision-recall curve (PR). We replicate the process in an independent dataset generated using human plasma samples.</p><p><strong>Results: </strong>We classified untargeted LC-MS features as \"lipid\" or \"non-lipid\" per the HMDB super class taxonomy and evaluated model performance. A framework including steps to choose the preprocessors and models for metabolite classification was designed. In our lab, tree-based models demonstrated superior performance across all metrics, achieving high accuracy, AUC, and PR which was consistent with the independent dataset.</p><p><strong>Conclusion: </strong>Our results demonstrate that metabolites can be classified as \"lipid\", \"non-lipid\" using only m/z and RT from untargeted LC-MS data, without requiring MS2 spectra. Although this study focused on lipid classification, the approach shows potential for broader application, which warrants further investigation across diverse compound classes, detection methods, and chromatographic conditions.</p>","PeriodicalId":18506,"journal":{"name":"Metabolomics","volume":"21 6","pages":"151"},"PeriodicalIF":3.3000,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12535499/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabolomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11306-025-02343-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: The identification of unknown metabolites remains a major challenge in untargeted metabolomics using liquid chromatography-mass spectrometry (LC-MS). This process typically depends on comparing mass spectral or chromatographic data to reference databases or deciphering complex fragmentation in tandem mass spectra. While current machine learning methods can predict metabolite structures using MS/MS (MS2) data, no approaches, to our knowledge, use only mass-to-charge ratio (m/z) and retention time (RT) from LC-MS data.

Objective: To explore the potential of using the mass-to-charge ratio (m/z) and retention time (RT) from LC-MS data as standalone predictors for metabolite classification and propose a modeling framework which can be implemented internally on standalone datasets.

Methods: We trained machine learning models on 20 mouse lung adenocarcinoma tumor samples with 7,353 features and validated them on a dataset of 81 samples with 22,000 features. A total of 120 combination of preprocessors and models were assessed. Features were classified as "lipid" or "non-lipid" based on the Human Metabolome Database (HMDB) taxonomy, and model performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), and area under the precision-recall curve (PR). We replicate the process in an independent dataset generated using human plasma samples.

Results: We classified untargeted LC-MS features as "lipid" or "non-lipid" per the HMDB super class taxonomy and evaluated model performance. A framework including steps to choose the preprocessors and models for metabolite classification was designed. In our lab, tree-based models demonstrated superior performance across all metrics, achieving high accuracy, AUC, and PR which was consistent with the independent dataset.

Conclusion: Our results demonstrate that metabolites can be classified as "lipid", "non-lipid" using only m/z and RT from untargeted LC-MS data, without requiring MS2 spectra. Although this study focused on lipid classification, the approach shows potential for broader application, which warrants further investigation across diverse compound classes, detection methods, and chromatographic conditions.

Abstract Image

Abstract Image

一个机器学习框架,用于使用质量电荷比和保留时间对非靶向代谢组学中的脂质进行分类。
在非靶向代谢组学中,使用液相色谱-质谱(LC-MS)鉴定未知代谢物仍然是一个主要挑战。该过程通常依赖于将质谱或色谱数据与参考数据库进行比较,或破译串联质谱中的复杂碎片。虽然目前的机器学习方法可以使用MS/MS (MS2)数据预测代谢物结构,但据我们所知,没有任何方法可以仅使用LC-MS数据中的质量电荷比(m/z)和保留时间(RT)。目的:探索利用LC-MS数据的质荷比(m/z)和保留时间(RT)作为代谢物分类的独立预测因子的潜力,并提出一个可以在独立数据集上内部实现的建模框架。方法:我们在包含7353个特征的20个小鼠肺腺癌样本上训练机器学习模型,并在包含22000个特征的81个样本数据集上对其进行验证。共评估了120种预处理器和模型的组合。根据人类代谢组数据库(HMDB)分类将特征分类为“脂质”或“非脂质”,并通过准确性、受试者工作特征曲线下面积(AUC)和精确召回曲线下面积(PR)来评估模型的性能。我们在使用人类血浆样本生成的独立数据集中复制了这一过程。结果:我们根据HMDB超类分类法将非靶向LC-MS特征分类为“脂质”或“非脂质”,并评估了模型性能。设计了一个框架,包括选择代谢物分类的预处理程序和模型的步骤。在我们的实验室中,基于树的模型在所有指标上都表现出卓越的性能,实现了与独立数据集一致的高精度、AUC和PR。结论:我们的研究结果表明,仅使用非靶向LC-MS数据的m/z和RT就可以将代谢物分类为“脂质”和“非脂质”,而不需要MS2光谱。虽然这项研究的重点是脂质分类,但该方法显示出更广泛的应用潜力,值得在不同的化合物类别、检测方法和色谱条件下进一步研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Metabolomics
Metabolomics 医学-内分泌学与代谢
CiteScore
6.60
自引率
2.80%
发文量
84
审稿时长
2 months
期刊介绍: Metabolomics publishes current research regarding the development of technology platforms for metabolomics. This includes, but is not limited to: metabolomic applications within man, including pre-clinical and clinical pharmacometabolomics for precision medicine metabolic profiling and fingerprinting metabolite target analysis metabolomic applications within animals, plants and microbes transcriptomics and proteomics in systems biology Metabolomics is an indispensable platform for researchers using new post-genomics approaches, to discover networks and interactions between metabolites, pharmaceuticals, SNPs, proteins and more. Its articles go beyond the genome and metabolome, by including original clinical study material together with big data from new emerging technologies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信