Near-infrared spectral interval screening based on hierarchical variables clustering and group SCAD in multivariate calibration

IF 2.7 3区 化学 Q2 CHEMISTRY, ANALYTICAL
Chen-Hao Huang
{"title":"Near-infrared spectral interval screening based on hierarchical variables clustering and group SCAD in multivariate calibration","authors":"Chen-Hao Huang","doi":"10.1016/j.vibspec.2024.103664","DOIUrl":null,"url":null,"abstract":"<div><p>Spectral interval screening is a critical step in multivariate calibration, which can improve the model predictive performance and data interpretation. In this study, a novel method for interval selection is proposed based on a hierarchical variables clustering and group smoothly clipped absolute deviation(group SCAD) in combination with partial least squares(VCG-PLS). The proposed method makes use of hierarchical variables clustering to yield a variables partitioning into groups at each level, and these groups of variables from different clustering levels are then used as input for group SCAD. The method is designed to select informative wavelength intervals for near-infrared(NIR) spectroscopic data analysis. The proposed method mainly consists of three steps. Firstly, an effective hierarchical clustering is employed to cluster wavelengths(variables), which generates a partition of variables into groups at each hierarchy level and obtains all possible wavelength intervals. Then, the series of group variables obtained from various hierarchy levels are given as input to group-SCAD, and group-SCAD can generate potential group variables corresponding to each regularization parameter value. Finally, a collection of PLS models is constructed recursively by employing all wavelength intervals except one, until the optimal wavelength intervals are obtained. The optimal intervals correspond to the lowest root mean square error of prediction. The VCG-PLS integrates the advantages of hierarchical variable clustering and group SCAD, which is an efficient technique to enhance the performance of PLS in interval selection. The performance of VCG-PLS was tested on three real NIR datasets. The results demonstrate that VCG-PLS can improve prediction performance with fewer variables and may be a good wavelength interval selection strategy.</p></div>","PeriodicalId":23656,"journal":{"name":"Vibrational Spectroscopy","volume":"131 ","pages":"Article 103664"},"PeriodicalIF":2.7000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vibrational Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924203124000171","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Spectral interval screening is a critical step in multivariate calibration, which can improve the model predictive performance and data interpretation. In this study, a novel method for interval selection is proposed based on a hierarchical variables clustering and group smoothly clipped absolute deviation(group SCAD) in combination with partial least squares(VCG-PLS). The proposed method makes use of hierarchical variables clustering to yield a variables partitioning into groups at each level, and these groups of variables from different clustering levels are then used as input for group SCAD. The method is designed to select informative wavelength intervals for near-infrared(NIR) spectroscopic data analysis. The proposed method mainly consists of three steps. Firstly, an effective hierarchical clustering is employed to cluster wavelengths(variables), which generates a partition of variables into groups at each hierarchy level and obtains all possible wavelength intervals. Then, the series of group variables obtained from various hierarchy levels are given as input to group-SCAD, and group-SCAD can generate potential group variables corresponding to each regularization parameter value. Finally, a collection of PLS models is constructed recursively by employing all wavelength intervals except one, until the optimal wavelength intervals are obtained. The optimal intervals correspond to the lowest root mean square error of prediction. The VCG-PLS integrates the advantages of hierarchical variable clustering and group SCAD, which is an efficient technique to enhance the performance of PLS in interval selection. The performance of VCG-PLS was tested on three real NIR datasets. The results demonstrate that VCG-PLS can improve prediction performance with fewer variables and may be a good wavelength interval selection strategy.

基于多变量校准中的分层变量聚类和组 SCAD 的近红外光谱间隔筛选
谱区间筛选是多变量校准的关键步骤,可以提高模型的预测性能和数据解释能力。本研究提出了一种基于分层变量聚类和组平滑剪切绝对偏差(组 SCAD)结合偏最小二乘法(VCG-PLS)的新型区间筛选方法。所提出的方法利用分层变量聚类将变量划分为各个层次的变量组,然后将这些来自不同聚类层次的变量组作为组 SCAD 的输入。该方法旨在为近红外光谱数据分析选择有参考价值的波长区间。所提出的方法主要包括三个步骤。首先,采用有效的分层聚类方法对波长(变量)进行聚类,在每个层次上对变量进行分组,从而得到所有可能的波长区间。然后,将从不同层次得到的一系列组变量作为组-SCAD 的输入,组-SCAD 可以生成与每个正则化参数值相对应的潜在组变量。最后,通过使用除一个波长区间外的所有波长区间,递归地构建 PLS 模型集合,直至获得最佳波长区间。最佳区间对应于最小的预测均方根误差。VCG-PLS 综合了分层变量聚类和组 SCAD 的优点,是一种提高 PLS 波长区间选择性能的有效技术。VCG-PLS 的性能在三个真实的近红外数据集上进行了测试。结果表明,VCG-PLS 可以用较少的变量提高预测性能,可能是一种很好的波长区间选择策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Vibrational Spectroscopy
Vibrational Spectroscopy 化学-分析化学
CiteScore
4.70
自引率
4.00%
发文量
103
审稿时长
52 days
期刊介绍: Vibrational Spectroscopy provides a vehicle for the publication of original research that focuses on vibrational spectroscopy. This covers infrared, near-infrared and Raman spectroscopies and publishes papers dealing with developments in applications, theory, techniques and instrumentation. The topics covered by the journal include: Sampling techniques, Vibrational spectroscopy coupled with separation techniques, Instrumentation (Fourier transform, conventional and laser based), Data manipulation, Spectra-structure correlation and group frequencies. The application areas covered include: Analytical chemistry, Bio-organic and bio-inorganic chemistry, Organic chemistry, Inorganic chemistry, Catalysis, Environmental science, Industrial chemistry, Materials science, Physical chemistry, Polymer science, Process control, Specialized problem solving.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信