Christelle Colin-Leitzinger, Yonatan Ayalew Mekonnen, Isis Narvaez-Bandera, Vanessa Y Rubio, Dalia Ercan, Eric A Welsh, Lancia N F Darville, Min Liu, Hayley D Ackerman, Julian Avila-Pacheco, Clary B Clish, Kevin Hicks, John M Koomen, Nancy Gillis, Brooke L Fridley, Elsa R Flores, Oana A Zeleznik, Paul A Stewart
{"title":"A machine learning framework for classifying lipids in untargeted metabolomics using mass-to-charge ratios and retention times.","authors":"Christelle Colin-Leitzinger, Yonatan Ayalew Mekonnen, Isis Narvaez-Bandera, Vanessa Y Rubio, Dalia Ercan, Eric A Welsh, Lancia N F Darville, Min Liu, Hayley D Ackerman, Julian Avila-Pacheco, Clary B Clish, Kevin Hicks, John M Koomen, Nancy Gillis, Brooke L Fridley, Elsa R Flores, Oana A Zeleznik, Paul A Stewart","doi":"10.1007/s11306-025-02343-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The identification of unknown metabolites remains a major challenge in untargeted metabolomics using liquid chromatography-mass spectrometry (LC-MS). This process typically depends on comparing mass spectral or chromatographic data to reference databases or deciphering complex fragmentation in tandem mass spectra. While current machine learning methods can predict metabolite structures using MS/MS (MS2) data, no approaches, to our knowledge, use only mass-to-charge ratio (m/z) and retention time (RT) from LC-MS data.</p><p><strong>Objective: </strong>To explore the potential of using the mass-to-charge ratio (m/z) and retention time (RT) from LC-MS data as standalone predictors for metabolite classification and propose a modeling framework which can be implemented internally on standalone datasets.</p><p><strong>Methods: </strong>We trained machine learning models on 20 mouse lung adenocarcinoma tumor samples with 7,353 features and validated them on a dataset of 81 samples with 22,000 features. A total of 120 combination of preprocessors and models were assessed. Features were classified as \"lipid\" or \"non-lipid\" based on the Human Metabolome Database (HMDB) taxonomy, and model performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), and area under the precision-recall curve (PR). We replicate the process in an independent dataset generated using human plasma samples.</p><p><strong>Results: </strong>We classified untargeted LC-MS features as \"lipid\" or \"non-lipid\" per the HMDB super class taxonomy and evaluated model performance. A framework including steps to choose the preprocessors and models for metabolite classification was designed. In our lab, tree-based models demonstrated superior performance across all metrics, achieving high accuracy, AUC, and PR which was consistent with the independent dataset.</p><p><strong>Conclusion: </strong>Our results demonstrate that metabolites can be classified as \"lipid\", \"non-lipid\" using only m/z and RT from untargeted LC-MS data, without requiring MS2 spectra. Although this study focused on lipid classification, the approach shows potential for broader application, which warrants further investigation across diverse compound classes, detection methods, and chromatographic conditions.</p>","PeriodicalId":18506,"journal":{"name":"Metabolomics","volume":"21 6","pages":"151"},"PeriodicalIF":3.3000,"publicationDate":"2025-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12535499/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Metabolomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11306-025-02343-y","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: The identification of unknown metabolites remains a major challenge in untargeted metabolomics using liquid chromatography-mass spectrometry (LC-MS). This process typically depends on comparing mass spectral or chromatographic data to reference databases or deciphering complex fragmentation in tandem mass spectra. While current machine learning methods can predict metabolite structures using MS/MS (MS2) data, no approaches, to our knowledge, use only mass-to-charge ratio (m/z) and retention time (RT) from LC-MS data.
Objective: To explore the potential of using the mass-to-charge ratio (m/z) and retention time (RT) from LC-MS data as standalone predictors for metabolite classification and propose a modeling framework which can be implemented internally on standalone datasets.
Methods: We trained machine learning models on 20 mouse lung adenocarcinoma tumor samples with 7,353 features and validated them on a dataset of 81 samples with 22,000 features. A total of 120 combination of preprocessors and models were assessed. Features were classified as "lipid" or "non-lipid" based on the Human Metabolome Database (HMDB) taxonomy, and model performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), and area under the precision-recall curve (PR). We replicate the process in an independent dataset generated using human plasma samples.
Results: We classified untargeted LC-MS features as "lipid" or "non-lipid" per the HMDB super class taxonomy and evaluated model performance. A framework including steps to choose the preprocessors and models for metabolite classification was designed. In our lab, tree-based models demonstrated superior performance across all metrics, achieving high accuracy, AUC, and PR which was consistent with the independent dataset.
Conclusion: Our results demonstrate that metabolites can be classified as "lipid", "non-lipid" using only m/z and RT from untargeted LC-MS data, without requiring MS2 spectra. Although this study focused on lipid classification, the approach shows potential for broader application, which warrants further investigation across diverse compound classes, detection methods, and chromatographic conditions.
期刊介绍:
Metabolomics publishes current research regarding the development of technology platforms for metabolomics. This includes, but is not limited to:
metabolomic applications within man, including pre-clinical and clinical
pharmacometabolomics for precision medicine
metabolic profiling and fingerprinting
metabolite target analysis
metabolomic applications within animals, plants and microbes
transcriptomics and proteomics in systems biology
Metabolomics is an indispensable platform for researchers using new post-genomics approaches, to discover networks and interactions between metabolites, pharmaceuticals, SNPs, proteins and more. Its articles go beyond the genome and metabolome, by including original clinical study material together with big data from new emerging technologies.