Species discrimination and VIP-stacking quantitative models for Curcumae Rhizoma utilizing multi-modal spectra combined with machine learning algorithm

IF 3.1 3区 医学 Q2 CHEMISTRY, ANALYTICAL
Xueyang Ren, Youyi Sun, Ting He, Jiamu Ma, Jianling Yao, Mingxia Li, Mengyu Sun, Wei Liu, Feng Zhang, Yu Cao, Yongqi Yang, Letian Ying, Yuqing Yang, Ruijuan Yuan, Gaimei She
{"title":"Species discrimination and VIP-stacking quantitative models for Curcumae Rhizoma utilizing multi-modal spectra combined with machine learning algorithm","authors":"Xueyang Ren,&nbsp;Youyi Sun,&nbsp;Ting He,&nbsp;Jiamu Ma,&nbsp;Jianling Yao,&nbsp;Mingxia Li,&nbsp;Mengyu Sun,&nbsp;Wei Liu,&nbsp;Feng Zhang,&nbsp;Yu Cao,&nbsp;Yongqi Yang,&nbsp;Letian Ying,&nbsp;Yuqing Yang,&nbsp;Ruijuan Yuan,&nbsp;Gaimei She","doi":"10.1016/j.jpba.2025.117092","DOIUrl":null,"url":null,"abstract":"<div><div><em>Curcumae Rhizoma</em> (<em>Ezhu</em>) is a multi-species herbal medicine with excellent medicinal value and development potential. However, challenges such as the difficulty in differentiating its varieties and the limitations of current methods for determining minor component content, which are time-consuming and cumbersome, necessitate improved approaches. Spectroscopic techniques combined with chemometrics offer a powerful alternative for developing qualitative and quantitative models, and the spectral data fusion has emerged as a key research hotpot. This study employed multi-modal spectroscopy including Fourier transform infrared (FT-IR), Fourier transform near-infrared (FT-NIR), and ultraviolet (UV) combined with multivariate algorithms to establish species discrimination and content prediction models for minor constituents in <em>Ezhu</em>. For qualitative analysis, linear discriminant analysis (LDA), k-nearest neighbor (KNN), and decision tree (DT) models based on fused UV+FT-NIR+FT-IR spectral data achieved 100 % classification accuracy. For quantitative analysis, a novel variable importance in projection (VIP)-guided stacking ensemble strategy was proposed, leveraging VIP scores derived from partial least squares regression (PLSR) to optimize base-learner combinations. This approach successfully constructed robust models for predicting the content of (3,5-dihydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane), (1,7-bis-(4-hydroxychalcone)-3,5-dihydroxy-heptane), (3<em>S</em>,5<em>S</em>)-3-acetoxy-5-hydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane, germacrone, zederone, curzerene, and curdione. Compared to conventional machine learning models and prior studies, the VIP-stacking ensemble models demonstrated superior predictive accuracy and robustness. This work highlights the efficacy of spectral data fusion in both qualitative and quantitative analyses and validates the potential of VIP-stacking ensemble strategies to enhance the performance of content prediction models. This study not only offers a more effective way to identify and quality control of <em>Ezhu</em>, but also provides a promising approach for species authentication and quality control in pharmaceutical, agricultural, and food science applications.</div></div>","PeriodicalId":16685,"journal":{"name":"Journal of pharmaceutical and biomedical analysis","volume":"266 ","pages":"Article 117092"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of pharmaceutical and biomedical analysis","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0731708525004339","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Curcumae Rhizoma (Ezhu) is a multi-species herbal medicine with excellent medicinal value and development potential. However, challenges such as the difficulty in differentiating its varieties and the limitations of current methods for determining minor component content, which are time-consuming and cumbersome, necessitate improved approaches. Spectroscopic techniques combined with chemometrics offer a powerful alternative for developing qualitative and quantitative models, and the spectral data fusion has emerged as a key research hotpot. This study employed multi-modal spectroscopy including Fourier transform infrared (FT-IR), Fourier transform near-infrared (FT-NIR), and ultraviolet (UV) combined with multivariate algorithms to establish species discrimination and content prediction models for minor constituents in Ezhu. For qualitative analysis, linear discriminant analysis (LDA), k-nearest neighbor (KNN), and decision tree (DT) models based on fused UV+FT-NIR+FT-IR spectral data achieved 100 % classification accuracy. For quantitative analysis, a novel variable importance in projection (VIP)-guided stacking ensemble strategy was proposed, leveraging VIP scores derived from partial least squares regression (PLSR) to optimize base-learner combinations. This approach successfully constructed robust models for predicting the content of (3,5-dihydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane), (1,7-bis-(4-hydroxychalcone)-3,5-dihydroxy-heptane), (3S,5S)-3-acetoxy-5-hydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane, germacrone, zederone, curzerene, and curdione. Compared to conventional machine learning models and prior studies, the VIP-stacking ensemble models demonstrated superior predictive accuracy and robustness. This work highlights the efficacy of spectral data fusion in both qualitative and quantitative analyses and validates the potential of VIP-stacking ensemble strategies to enhance the performance of content prediction models. This study not only offers a more effective way to identify and quality control of Ezhu, but also provides a promising approach for species authentication and quality control in pharmaceutical, agricultural, and food science applications.
基于多模态光谱结合机器学习算法的姜黄物种鉴别及vip叠加定量模型
莪术是一种具有优良药用价值和开发潜力的多品种中草药。然而,诸如难以区分其品种以及目前测定微量成分含量的方法的局限性等挑战,需要改进方法,这些方法耗时且繁琐。光谱技术与化学计量学的结合为建立定性和定量模型提供了强有力的选择,光谱数据融合已成为研究的热点。本研究采用傅里叶变换红外(FT-IR)、傅里叶变换近红外(FT-NIR)、紫外(UV)等多模态光谱技术,结合多变量算法建立了鄂竹中微量成分的种类判别和含量预测模型。在定性分析方面,基于融合UV+FT-NIR+FT-IR光谱数据的线性判别分析(LDA)、k-近邻(KNN)和决策树(DT)模型的分类准确率达到了100% %。在定量分析方面,提出了一种新的可变重要度投影(VIP)引导的叠加集成策略,利用偏最小二乘回归(PLSR)得到的VIP分数来优化基础-学习者组合。该方法成功构建了预测(3,5-二羟基-1-(3,4-二羟基苯基)-7-(4-羟基苯基)-庚烷)、(1,7-二-(4-羟基查尔酮)-3,5-二羟基庚烷)、(3S,5S)-3-乙酰氧基-5-羟基-1-(3,4-二羟基苯基)-7-(4-羟基苯基)-庚烷、germacrone、zederone、curzerene和curdione含量的稳健模型。与传统的机器学习模型和先前的研究相比,vip堆叠集成模型显示出更高的预测精度和鲁棒性。这项工作强调了光谱数据融合在定性和定量分析中的有效性,并验证了vip堆叠集成策略在提高含量预测模型性能方面的潜力。本研究不仅为鄂珠药材的鉴别和质量控制提供了更有效的方法,而且为鄂珠药材在医药、农业和食品科学等领域的物种鉴定和质量控制提供了新的途径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.70
自引率
5.90%
发文量
588
审稿时长
37 days
期刊介绍: This journal is an international medium directed towards the needs of academic, clinical, government and industrial analysis by publishing original research reports and critical reviews on pharmaceutical and biomedical analysis. It covers the interdisciplinary aspects of analysis in the pharmaceutical, biomedical and clinical sciences, including developments in analytical methodology, instrumentation, computation and interpretation. Submissions on novel applications focusing on drug purity and stability studies, pharmacokinetics, therapeutic monitoring, metabolic profiling; drug-related aspects of analytical biochemistry and forensic toxicology; quality assurance in the pharmaceutical industry are also welcome. Studies from areas of well established and poorly selective methods, such as UV-VIS spectrophotometry (including derivative and multi-wavelength measurements), basic electroanalytical (potentiometric, polarographic and voltammetric) methods, fluorimetry, flow-injection analysis, etc. are accepted for publication in exceptional cases only, if a unique and substantial advantage over presently known systems is demonstrated. The same applies to the assay of simple drug formulations by any kind of methods and the determination of drugs in biological samples based merely on spiked samples. Drug purity/stability studies should contain information on the structure elucidation of the impurities/degradants.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信