基于遗传算法和混合回归的有限近红外光谱数据有效波长选择

IF 2.1 4区 化学 Q1 SOCIAL WORK
Esra Pamukçu
{"title":"基于遗传算法和混合回归的有限近红外光谱数据有效波长选择","authors":"Esra Pamukçu","doi":"10.1002/cem.70015","DOIUrl":null,"url":null,"abstract":"<p>Spectral data often contains a large number of variables that are highly correlated. Although Partial Least Squares (PLS) regression is specifically designed to handle issues arising from limited sample sizes, its effectiveness may still diminish in e<i>x</i>tremely small datasets, making it challenging to construct a calibration model with high predictive performance. This study introduces a new framework, the Genetic Algorithm and Hybrid Regression Model (GAHRM), designed specifically for variable selection and regression in high-dimensional, low-sample-size spectral datasets. GAHRM integrates Hybrid Regression, which constructs regression models using a covariance structure that is first stabilized through Thomaz Stabilization and then regularized, with Genetic Algorithm (GA), an efficient optimization technique for selecting the best subset of variables among a vast model space. Unlike traditional approaches that rely on exhaustive search for model selection criteria, GAHRM leverages GA to navigate the exponentially large search space, enabling computationally feasible and robust model construction. The effectiveness of GAHRM was validated on the benchmark “Gasoline” dataset, where it demonstrated superior performance compared to PLS in terms of prediction accuracy and model selection efficiency. These results highlight GAHRM as a powerful alternative for wavelength selection and calibration modeling in challenging data scenarios.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70015","citationCount":"0","resultStr":"{\"title\":\"Efficient Wavelength Selection for Limited Near-Infrared Spectral Data via Genetic Algorithm and Hybrid Regression\",\"authors\":\"Esra Pamukçu\",\"doi\":\"10.1002/cem.70015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Spectral data often contains a large number of variables that are highly correlated. Although Partial Least Squares (PLS) regression is specifically designed to handle issues arising from limited sample sizes, its effectiveness may still diminish in e<i>x</i>tremely small datasets, making it challenging to construct a calibration model with high predictive performance. This study introduces a new framework, the Genetic Algorithm and Hybrid Regression Model (GAHRM), designed specifically for variable selection and regression in high-dimensional, low-sample-size spectral datasets. GAHRM integrates Hybrid Regression, which constructs regression models using a covariance structure that is first stabilized through Thomaz Stabilization and then regularized, with Genetic Algorithm (GA), an efficient optimization technique for selecting the best subset of variables among a vast model space. Unlike traditional approaches that rely on exhaustive search for model selection criteria, GAHRM leverages GA to navigate the exponentially large search space, enabling computationally feasible and robust model construction. The effectiveness of GAHRM was validated on the benchmark “Gasoline” dataset, where it demonstrated superior performance compared to PLS in terms of prediction accuracy and model selection efficiency. These results highlight GAHRM as a powerful alternative for wavelength selection and calibration modeling in challenging data scenarios.</p>\",\"PeriodicalId\":15274,\"journal\":{\"name\":\"Journal of Chemometrics\",\"volume\":\"39 3\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70015\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemometrics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cem.70015\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL WORK\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.70015","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0

摘要

光谱数据通常包含大量高度相关的变量。虽然偏最小二乘(PLS)回归是专门为处理有限样本量引起的问题而设计的,但它的有效性在极小的数据集上仍然可能下降,这使得构建具有高预测性能的校准模型具有挑战性。本研究引入了一个新的框架,遗传算法和混合回归模型(GAHRM),专门用于高维、低样本容量光谱数据集的变量选择和回归。GAHRM将混合回归(Hybrid Regression)与遗传算法(Genetic Algorithm, GA)结合在一起,混合回归是使用协方差结构构建回归模型,协方差结构首先通过thomas稳定化稳定然后正则化,遗传算法是一种有效的优化技术,用于在巨大的模型空间中选择变量的最佳子集。与传统方法依赖于对模型选择标准的详尽搜索不同,GAHRM利用遗传算法来导航指数级大的搜索空间,从而实现计算上可行和健壮的模型构建。GAHRM的有效性在基准“汽油”数据集上得到验证,与PLS相比,GAHRM在预测精度和模型选择效率方面表现出优越的性能。这些结果突出了GAHRM作为具有挑战性的数据场景中波长选择和校准建模的强大替代方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Efficient Wavelength Selection for Limited Near-Infrared Spectral Data via Genetic Algorithm and Hybrid Regression

Efficient Wavelength Selection for Limited Near-Infrared Spectral Data via Genetic Algorithm and Hybrid Regression

Spectral data often contains a large number of variables that are highly correlated. Although Partial Least Squares (PLS) regression is specifically designed to handle issues arising from limited sample sizes, its effectiveness may still diminish in extremely small datasets, making it challenging to construct a calibration model with high predictive performance. This study introduces a new framework, the Genetic Algorithm and Hybrid Regression Model (GAHRM), designed specifically for variable selection and regression in high-dimensional, low-sample-size spectral datasets. GAHRM integrates Hybrid Regression, which constructs regression models using a covariance structure that is first stabilized through Thomaz Stabilization and then regularized, with Genetic Algorithm (GA), an efficient optimization technique for selecting the best subset of variables among a vast model space. Unlike traditional approaches that rely on exhaustive search for model selection criteria, GAHRM leverages GA to navigate the exponentially large search space, enabling computationally feasible and robust model construction. The effectiveness of GAHRM was validated on the benchmark “Gasoline” dataset, where it demonstrated superior performance compared to PLS in terms of prediction accuracy and model selection efficiency. These results highlight GAHRM as a powerful alternative for wavelength selection and calibration modeling in challenging data scenarios.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信