{"title":"Integration of deep neural network modeling and LC-MS-based pseudo-targeted metabolomics to discriminate easily confused ginseng species.","authors":"Meiting Jiang, Yuyang Sha, Yadan Zou, Xiaoyan Xu, Mengxiang Ding, Xu Lian, Hongda Wang, Qilong Wang, Kefeng Li, De-An Guo, Wenzhi Yang","doi":"10.1016/j.jpha.2024.101116","DOIUrl":null,"url":null,"abstract":"<p><p>Metabolomics covers a wide range of applications in life sciences, biomedicine, and phytology. Data acquisition (to achieve high coverage and efficiency) and analysis (to pursue good classification) are two key segments involved in metabolomics workflows. Various chemometric approaches utilizing either pattern recognition or machine learning have been employed to separate different groups. However, insufficient feature extraction, inappropriate feature selection, overfitting, or underfitting lead to an insufficient capacity to discriminate plants that are often easily confused. Using two ginseng varieties, namely <i>Panax japonicus</i> (PJ) and <i>Panax</i> <i>japonicus</i> var. <i>major</i> (PJvm), containing the similar ginsenosides, we integrated pseudo-targeted metabolomics and deep neural network (DNN) modeling to achieve accurate species differentiation. A pseudo-targeted metabolomics approach was optimized through data acquisition mode, ion pairs generation, comparison between multiple reaction monitoring (MRM) and scheduled MRM (sMRM), and chromatographic elution gradient. In total, 1980 ion pairs were monitored within 23 min, allowing for the most comprehensive ginseng metabolome analysis. The established DNN model demonstrated excellent classification performance (in terms of accuracy, precision, recall, F1 score, area under the curve, and receiver operating characteristic (ROC)) using the entire metabolome data and feature-selection dataset, exhibiting superior advantages over random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost), and multilayer perceptron (MLP). Moreover, DNNs were advantageous for automated feature learning, nonlinear modeling, adaptability, and generalization. This study confirmed practicality of the established strategy for efficient metabolomics data analysis and reliable classification performance even when using small-volume samples. This established approach holds promise for plant metabolomics and is not limited to ginseng.</p>","PeriodicalId":94338,"journal":{"name":"Journal of pharmaceutical analysis","volume":"15 1","pages":"101116"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11788866/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of pharmaceutical analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.jpha.2024.101116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/26 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Metabolomics covers a wide range of applications in life sciences, biomedicine, and phytology. Data acquisition (to achieve high coverage and efficiency) and analysis (to pursue good classification) are two key segments involved in metabolomics workflows. Various chemometric approaches utilizing either pattern recognition or machine learning have been employed to separate different groups. However, insufficient feature extraction, inappropriate feature selection, overfitting, or underfitting lead to an insufficient capacity to discriminate plants that are often easily confused. Using two ginseng varieties, namely Panax japonicus (PJ) and Panaxjaponicus var. major (PJvm), containing the similar ginsenosides, we integrated pseudo-targeted metabolomics and deep neural network (DNN) modeling to achieve accurate species differentiation. A pseudo-targeted metabolomics approach was optimized through data acquisition mode, ion pairs generation, comparison between multiple reaction monitoring (MRM) and scheduled MRM (sMRM), and chromatographic elution gradient. In total, 1980 ion pairs were monitored within 23 min, allowing for the most comprehensive ginseng metabolome analysis. The established DNN model demonstrated excellent classification performance (in terms of accuracy, precision, recall, F1 score, area under the curve, and receiver operating characteristic (ROC)) using the entire metabolome data and feature-selection dataset, exhibiting superior advantages over random forest (RF), support vector machine (SVM), extreme gradient boosting (XGBoost), and multilayer perceptron (MLP). Moreover, DNNs were advantageous for automated feature learning, nonlinear modeling, adaptability, and generalization. This study confirmed practicality of the established strategy for efficient metabolomics data analysis and reliable classification performance even when using small-volume samples. This established approach holds promise for plant metabolomics and is not limited to ginseng.
代谢组学在生命科学、生物医学、生理学等领域有着广泛的应用。数据采集(实现高覆盖率和高效率)和分析(追求良好的分类)是代谢组学工作流程中涉及的两个关键部分。利用模式识别或机器学习的各种化学计量学方法已被用于分离不同的组。然而,不充分的特征提取、不适当的特征选择、过拟合或欠拟合导致识别容易混淆的植物的能力不足。利用含有相似人参皂苷的两个人参品种,即Panax japonicus (PJ)和Panax japonicus var. major (PJvm),将伪靶向代谢组学和深度神经网络(DNN)建模相结合,实现准确的物种分化。通过数据采集模式、离子对生成、多反应监测(MRM)与计划MRM (sMRM)的比较、色谱洗脱梯度等方面对拟靶向代谢组学方法进行优化。总共在23分钟内监测了1980个离子对,从而实现了最全面的人参代谢组分析。利用整个代谢组数据和特征选择数据集,所建立的DNN模型在准确率、精密度、召回率、F1评分、曲线下面积和接收者工作特征(ROC)方面表现出优异的分类性能,与随机森林(RF)、支持向量机(SVM)、极端梯度增强(XGBoost)和多层感知器(MLP)相比具有优越的优势。此外,深度神经网络在自动特征学习、非线性建模、适应性和泛化方面具有优势。本研究证实了所建立的策略的实用性,即使在使用小体积样本时,也可以进行高效的代谢组学数据分析和可靠的分类性能。这种既定的方法对植物代谢组学有希望,并不局限于人参。