Na Zhao, Zhisheng Wu, Chunying Wu, Shuyu Wang, Xueyan Zhan
{"title":"Performance evaluation of variable selection methods coupled with partial least squares regression to determine the target component in solid samples","authors":"Na Zhao, Zhisheng Wu, Chunying Wu, Shuyu Wang, Xueyan Zhan","doi":"10.1177/09670335221097236","DOIUrl":null,"url":null,"abstract":"Variable selection can improve the robustness and prediction accuracy of partial least squares (PLS) regression models and decrease the calculation time by selecting the optimal subset of variables in multivariate calibration. In this study, the performance of two variable selection methods for wavelength interval and individual wavelength coupled with partial least squares regression are investigated by employing the experimental data of asiaticoside (AS) and madecassoside (MS) contents in centella total glucosides (CTG) and a public dataset of corn. The studied variable selection methods include interval partial least squares regression (iPLS), backward interval partial least squares (biPLS), synergy interval partial least squares regression (siPLS), competitive adaptive reweighted sampling (CARS), uninformative variable elimination (UVE) and variable importance in projection (VIP). The results show that the implementation of variable selection methods improved the performance of the model compared with full-spectrum modeling. All variable selection methods improved the prediction of AS or MS contents in CTG. When latent variables for PLS models are less than 10 in the practical application, the RPD value of AS models by iPLS method is 7.5, and the RPD value of MS models by biPLS method is 2.9. The results of wavelength interval selection are better than individual wavelength selection, especially for iPLS and biPLS. The same results were obtained with the public data for moisture in corn, and the RPD value of biPLS model of moisture is 1.6. Therefore, the wavelength interval selection methods, such as iPLS or biPLS, are appropriate for improving the PLS model’s accuracy and robustness to determine the target components’ contents in solid samples. Graphical Abstract","PeriodicalId":16551,"journal":{"name":"Journal of Near Infrared Spectroscopy","volume":"30 1","pages":"171 - 178"},"PeriodicalIF":1.6000,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Near Infrared Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1177/09670335221097236","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
Variable selection can improve the robustness and prediction accuracy of partial least squares (PLS) regression models and decrease the calculation time by selecting the optimal subset of variables in multivariate calibration. In this study, the performance of two variable selection methods for wavelength interval and individual wavelength coupled with partial least squares regression are investigated by employing the experimental data of asiaticoside (AS) and madecassoside (MS) contents in centella total glucosides (CTG) and a public dataset of corn. The studied variable selection methods include interval partial least squares regression (iPLS), backward interval partial least squares (biPLS), synergy interval partial least squares regression (siPLS), competitive adaptive reweighted sampling (CARS), uninformative variable elimination (UVE) and variable importance in projection (VIP). The results show that the implementation of variable selection methods improved the performance of the model compared with full-spectrum modeling. All variable selection methods improved the prediction of AS or MS contents in CTG. When latent variables for PLS models are less than 10 in the practical application, the RPD value of AS models by iPLS method is 7.5, and the RPD value of MS models by biPLS method is 2.9. The results of wavelength interval selection are better than individual wavelength selection, especially for iPLS and biPLS. The same results were obtained with the public data for moisture in corn, and the RPD value of biPLS model of moisture is 1.6. Therefore, the wavelength interval selection methods, such as iPLS or biPLS, are appropriate for improving the PLS model’s accuracy and robustness to determine the target components’ contents in solid samples. Graphical Abstract
期刊介绍:
JNIRS — Journal of Near Infrared Spectroscopy is a peer reviewed journal, publishing original research papers, short communications, review articles and letters concerned with near infrared spectroscopy and technology, its application, new instrumentation and the use of chemometric and data handling techniques within NIR.