结合偏最小二乘回归确定固体样品中目标组分的变量选择方法的性能评价

IF 1.6 4区 化学 Q3 CHEMISTRY, APPLIED
Na Zhao, Zhisheng Wu, Chunying Wu, Shuyu Wang, Xueyan Zhan
{"title":"结合偏最小二乘回归确定固体样品中目标组分的变量选择方法的性能评价","authors":"Na Zhao, Zhisheng Wu, Chunying Wu, Shuyu Wang, Xueyan Zhan","doi":"10.1177/09670335221097236","DOIUrl":null,"url":null,"abstract":"Variable selection can improve the robustness and prediction accuracy of partial least squares (PLS) regression models and decrease the calculation time by selecting the optimal subset of variables in multivariate calibration. In this study, the performance of two variable selection methods for wavelength interval and individual wavelength coupled with partial least squares regression are investigated by employing the experimental data of asiaticoside (AS) and madecassoside (MS) contents in centella total glucosides (CTG) and a public dataset of corn. The studied variable selection methods include interval partial least squares regression (iPLS), backward interval partial least squares (biPLS), synergy interval partial least squares regression (siPLS), competitive adaptive reweighted sampling (CARS), uninformative variable elimination (UVE) and variable importance in projection (VIP). The results show that the implementation of variable selection methods improved the performance of the model compared with full-spectrum modeling. All variable selection methods improved the prediction of AS or MS contents in CTG. When latent variables for PLS models are less than 10 in the practical application, the RPD value of AS models by iPLS method is 7.5, and the RPD value of MS models by biPLS method is 2.9. The results of wavelength interval selection are better than individual wavelength selection, especially for iPLS and biPLS. The same results were obtained with the public data for moisture in corn, and the RPD value of biPLS model of moisture is 1.6. Therefore, the wavelength interval selection methods, such as iPLS or biPLS, are appropriate for improving the PLS model’s accuracy and robustness to determine the target components’ contents in solid samples. Graphical Abstract","PeriodicalId":16551,"journal":{"name":"Journal of Near Infrared Spectroscopy","volume":null,"pages":null},"PeriodicalIF":1.6000,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance evaluation of variable selection methods coupled with partial least squares regression to determine the target component in solid samples\",\"authors\":\"Na Zhao, Zhisheng Wu, Chunying Wu, Shuyu Wang, Xueyan Zhan\",\"doi\":\"10.1177/09670335221097236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Variable selection can improve the robustness and prediction accuracy of partial least squares (PLS) regression models and decrease the calculation time by selecting the optimal subset of variables in multivariate calibration. In this study, the performance of two variable selection methods for wavelength interval and individual wavelength coupled with partial least squares regression are investigated by employing the experimental data of asiaticoside (AS) and madecassoside (MS) contents in centella total glucosides (CTG) and a public dataset of corn. The studied variable selection methods include interval partial least squares regression (iPLS), backward interval partial least squares (biPLS), synergy interval partial least squares regression (siPLS), competitive adaptive reweighted sampling (CARS), uninformative variable elimination (UVE) and variable importance in projection (VIP). The results show that the implementation of variable selection methods improved the performance of the model compared with full-spectrum modeling. All variable selection methods improved the prediction of AS or MS contents in CTG. When latent variables for PLS models are less than 10 in the practical application, the RPD value of AS models by iPLS method is 7.5, and the RPD value of MS models by biPLS method is 2.9. The results of wavelength interval selection are better than individual wavelength selection, especially for iPLS and biPLS. The same results were obtained with the public data for moisture in corn, and the RPD value of biPLS model of moisture is 1.6. Therefore, the wavelength interval selection methods, such as iPLS or biPLS, are appropriate for improving the PLS model’s accuracy and robustness to determine the target components’ contents in solid samples. Graphical Abstract\",\"PeriodicalId\":16551,\"journal\":{\"name\":\"Journal of Near Infrared Spectroscopy\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2022-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Near Infrared Spectroscopy\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1177/09670335221097236\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Near Infrared Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1177/09670335221097236","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0

摘要

变量选择可以通过在多元校准中选择变量的最优子集来提高偏最小二乘回归模型的稳健性和预测精度,并减少计算时间。本研究利用积雪草总苷(CTG)中积雪草苷(AS)和积雪草甙(MS)含量的实验数据和玉米的公共数据集,研究了波长区间和单个波长两种变量选择方法与偏最小二乘回归相结合的性能。所研究的变量选择方法包括区间偏最小二乘回归(iPLS)、后向区间偏最小二乘(biPLS)、协同区间偏最小二乘返回(siPLS)、竞争自适应重加权抽样(CARS)、无信息变量消除(UVE)和变量在投影中的重要性(VIP)。结果表明,与全谱建模相比,变量选择方法的实现提高了模型的性能。所有的变量选择方法都改进了CTG中AS或MS含量的预测。在实际应用中,当PLS模型的潜在变量小于10时,iPLS方法的AS模型的RPD值为7.5,biPLS方法的MS模型的RPD值为2.9。波长间隔选择的结果优于单独的波长选择,特别是对于iPLS和biPLS。玉米水分的公开数据也得到了相同的结果,水分的biPLS模型的RPD值为1.6。因此,波长间隔选择方法,如iPLS或biPLS,适用于提高PLS模型的准确性和稳健性,以确定固体样品中目标成分的含量。图形摘要
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance evaluation of variable selection methods coupled with partial least squares regression to determine the target component in solid samples
Variable selection can improve the robustness and prediction accuracy of partial least squares (PLS) regression models and decrease the calculation time by selecting the optimal subset of variables in multivariate calibration. In this study, the performance of two variable selection methods for wavelength interval and individual wavelength coupled with partial least squares regression are investigated by employing the experimental data of asiaticoside (AS) and madecassoside (MS) contents in centella total glucosides (CTG) and a public dataset of corn. The studied variable selection methods include interval partial least squares regression (iPLS), backward interval partial least squares (biPLS), synergy interval partial least squares regression (siPLS), competitive adaptive reweighted sampling (CARS), uninformative variable elimination (UVE) and variable importance in projection (VIP). The results show that the implementation of variable selection methods improved the performance of the model compared with full-spectrum modeling. All variable selection methods improved the prediction of AS or MS contents in CTG. When latent variables for PLS models are less than 10 in the practical application, the RPD value of AS models by iPLS method is 7.5, and the RPD value of MS models by biPLS method is 2.9. The results of wavelength interval selection are better than individual wavelength selection, especially for iPLS and biPLS. The same results were obtained with the public data for moisture in corn, and the RPD value of biPLS model of moisture is 1.6. Therefore, the wavelength interval selection methods, such as iPLS or biPLS, are appropriate for improving the PLS model’s accuracy and robustness to determine the target components’ contents in solid samples. Graphical Abstract
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.30
自引率
5.60%
发文量
35
审稿时长
6 months
期刊介绍: JNIRS — Journal of Near Infrared Spectroscopy is a peer reviewed journal, publishing original research papers, short communications, review articles and letters concerned with near infrared spectroscopy and technology, its application, new instrumentation and the use of chemometric and data handling techniques within NIR.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信