{"title":"基于多模态光谱结合机器学习算法的姜黄物种鉴别及vip叠加定量模型","authors":"Xueyang Ren, Youyi Sun, Ting He, Jiamu Ma, Jianling Yao, Mingxia Li, Mengyu Sun, Wei Liu, Feng Zhang, Yu Cao, Yongqi Yang, Letian Ying, Yuqing Yang, Ruijuan Yuan, Gaimei She","doi":"10.1016/j.jpba.2025.117092","DOIUrl":null,"url":null,"abstract":"<div><div><em>Curcumae Rhizoma</em> (<em>Ezhu</em>) is a multi-species herbal medicine with excellent medicinal value and development potential. However, challenges such as the difficulty in differentiating its varieties and the limitations of current methods for determining minor component content, which are time-consuming and cumbersome, necessitate improved approaches. Spectroscopic techniques combined with chemometrics offer a powerful alternative for developing qualitative and quantitative models, and the spectral data fusion has emerged as a key research hotpot. This study employed multi-modal spectroscopy including Fourier transform infrared (FT-IR), Fourier transform near-infrared (FT-NIR), and ultraviolet (UV) combined with multivariate algorithms to establish species discrimination and content prediction models for minor constituents in <em>Ezhu</em>. For qualitative analysis, linear discriminant analysis (LDA), k-nearest neighbor (KNN), and decision tree (DT) models based on fused UV+FT-NIR+FT-IR spectral data achieved 100 % classification accuracy. For quantitative analysis, a novel variable importance in projection (VIP)-guided stacking ensemble strategy was proposed, leveraging VIP scores derived from partial least squares regression (PLSR) to optimize base-learner combinations. This approach successfully constructed robust models for predicting the content of (3,5-dihydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane), (1,7-bis-(4-hydroxychalcone)-3,5-dihydroxy-heptane), (3<em>S</em>,5<em>S</em>)-3-acetoxy-5-hydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane, germacrone, zederone, curzerene, and curdione. Compared to conventional machine learning models and prior studies, the VIP-stacking ensemble models demonstrated superior predictive accuracy and robustness. This work highlights the efficacy of spectral data fusion in both qualitative and quantitative analyses and validates the potential of VIP-stacking ensemble strategies to enhance the performance of content prediction models. This study not only offers a more effective way to identify and quality control of <em>Ezhu</em>, but also provides a promising approach for species authentication and quality control in pharmaceutical, agricultural, and food science applications.</div></div>","PeriodicalId":16685,"journal":{"name":"Journal of pharmaceutical and biomedical analysis","volume":"266 ","pages":"Article 117092"},"PeriodicalIF":3.1000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Species discrimination and VIP-stacking quantitative models for Curcumae Rhizoma utilizing multi-modal spectra combined with machine learning algorithm\",\"authors\":\"Xueyang Ren, Youyi Sun, Ting He, Jiamu Ma, Jianling Yao, Mingxia Li, Mengyu Sun, Wei Liu, Feng Zhang, Yu Cao, Yongqi Yang, Letian Ying, Yuqing Yang, Ruijuan Yuan, Gaimei She\",\"doi\":\"10.1016/j.jpba.2025.117092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div><em>Curcumae Rhizoma</em> (<em>Ezhu</em>) is a multi-species herbal medicine with excellent medicinal value and development potential. However, challenges such as the difficulty in differentiating its varieties and the limitations of current methods for determining minor component content, which are time-consuming and cumbersome, necessitate improved approaches. Spectroscopic techniques combined with chemometrics offer a powerful alternative for developing qualitative and quantitative models, and the spectral data fusion has emerged as a key research hotpot. This study employed multi-modal spectroscopy including Fourier transform infrared (FT-IR), Fourier transform near-infrared (FT-NIR), and ultraviolet (UV) combined with multivariate algorithms to establish species discrimination and content prediction models for minor constituents in <em>Ezhu</em>. For qualitative analysis, linear discriminant analysis (LDA), k-nearest neighbor (KNN), and decision tree (DT) models based on fused UV+FT-NIR+FT-IR spectral data achieved 100 % classification accuracy. For quantitative analysis, a novel variable importance in projection (VIP)-guided stacking ensemble strategy was proposed, leveraging VIP scores derived from partial least squares regression (PLSR) to optimize base-learner combinations. This approach successfully constructed robust models for predicting the content of (3,5-dihydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane), (1,7-bis-(4-hydroxychalcone)-3,5-dihydroxy-heptane), (3<em>S</em>,5<em>S</em>)-3-acetoxy-5-hydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane, germacrone, zederone, curzerene, and curdione. Compared to conventional machine learning models and prior studies, the VIP-stacking ensemble models demonstrated superior predictive accuracy and robustness. This work highlights the efficacy of spectral data fusion in both qualitative and quantitative analyses and validates the potential of VIP-stacking ensemble strategies to enhance the performance of content prediction models. This study not only offers a more effective way to identify and quality control of <em>Ezhu</em>, but also provides a promising approach for species authentication and quality control in pharmaceutical, agricultural, and food science applications.</div></div>\",\"PeriodicalId\":16685,\"journal\":{\"name\":\"Journal of pharmaceutical and biomedical analysis\",\"volume\":\"266 \",\"pages\":\"Article 117092\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of pharmaceutical and biomedical analysis\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0731708525004339\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, ANALYTICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of pharmaceutical and biomedical analysis","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0731708525004339","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
Species discrimination and VIP-stacking quantitative models for Curcumae Rhizoma utilizing multi-modal spectra combined with machine learning algorithm
Curcumae Rhizoma (Ezhu) is a multi-species herbal medicine with excellent medicinal value and development potential. However, challenges such as the difficulty in differentiating its varieties and the limitations of current methods for determining minor component content, which are time-consuming and cumbersome, necessitate improved approaches. Spectroscopic techniques combined with chemometrics offer a powerful alternative for developing qualitative and quantitative models, and the spectral data fusion has emerged as a key research hotpot. This study employed multi-modal spectroscopy including Fourier transform infrared (FT-IR), Fourier transform near-infrared (FT-NIR), and ultraviolet (UV) combined with multivariate algorithms to establish species discrimination and content prediction models for minor constituents in Ezhu. For qualitative analysis, linear discriminant analysis (LDA), k-nearest neighbor (KNN), and decision tree (DT) models based on fused UV+FT-NIR+FT-IR spectral data achieved 100 % classification accuracy. For quantitative analysis, a novel variable importance in projection (VIP)-guided stacking ensemble strategy was proposed, leveraging VIP scores derived from partial least squares regression (PLSR) to optimize base-learner combinations. This approach successfully constructed robust models for predicting the content of (3,5-dihydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane), (1,7-bis-(4-hydroxychalcone)-3,5-dihydroxy-heptane), (3S,5S)-3-acetoxy-5-hydroxy-1-(3,4-dihydroxyphenyl)-7-(4-hydroxyphenyl)-heptane, germacrone, zederone, curzerene, and curdione. Compared to conventional machine learning models and prior studies, the VIP-stacking ensemble models demonstrated superior predictive accuracy and robustness. This work highlights the efficacy of spectral data fusion in both qualitative and quantitative analyses and validates the potential of VIP-stacking ensemble strategies to enhance the performance of content prediction models. This study not only offers a more effective way to identify and quality control of Ezhu, but also provides a promising approach for species authentication and quality control in pharmaceutical, agricultural, and food science applications.
期刊介绍:
This journal is an international medium directed towards the needs of academic, clinical, government and industrial analysis by publishing original research reports and critical reviews on pharmaceutical and biomedical analysis. It covers the interdisciplinary aspects of analysis in the pharmaceutical, biomedical and clinical sciences, including developments in analytical methodology, instrumentation, computation and interpretation. Submissions on novel applications focusing on drug purity and stability studies, pharmacokinetics, therapeutic monitoring, metabolic profiling; drug-related aspects of analytical biochemistry and forensic toxicology; quality assurance in the pharmaceutical industry are also welcome.
Studies from areas of well established and poorly selective methods, such as UV-VIS spectrophotometry (including derivative and multi-wavelength measurements), basic electroanalytical (potentiometric, polarographic and voltammetric) methods, fluorimetry, flow-injection analysis, etc. are accepted for publication in exceptional cases only, if a unique and substantial advantage over presently known systems is demonstrated. The same applies to the assay of simple drug formulations by any kind of methods and the determination of drugs in biological samples based merely on spiked samples. Drug purity/stability studies should contain information on the structure elucidation of the impurities/degradants.