Near-infrared spectroscopic prediction of gasoline olefin content: A systematic approach using continuous region feature selection and region-sensitive ensemble learning

IF 3.8 2区 化学 Q2 AUTOMATION & CONTROL SYSTEMS
Jiaxue Cui , Dawei Zhang , Banglian Xu , Jianzhong Fan , Xianglong Cao
{"title":"Near-infrared spectroscopic prediction of gasoline olefin content: A systematic approach using continuous region feature selection and region-sensitive ensemble learning","authors":"Jiaxue Cui ,&nbsp;Dawei Zhang ,&nbsp;Banglian Xu ,&nbsp;Jianzhong Fan ,&nbsp;Xianglong Cao","doi":"10.1016/j.chemolab.2026.105661","DOIUrl":null,"url":null,"abstract":"<div><div>This study addresses the challenges of high-dimensional collinearity and regional information heterogeneity in near-infrared spectroscopy for gasoline olefin content prediction by proposing a systematic optimization approach combining a Continuous Region Utilizing Integrated Spectral Evaluation for Near-Infrared (CRUISE-NIR) algorithm with a Region-Sensitive Adaptive Ensemble Learning (RAEL) framework. The CRUISE-NIR algorithm shifts spectral analysis from a “point” to a “region” perspective, fully considering the physical correlation of adjacent wavelengths and chemical prior knowledge, reducing 4443 original variables to 16 key features. Meanwhile, the RAEL framework dynamically adjusts prediction weights according to sample performance characteristics in different spectral regions, achieving sample-specific precision prediction. Experimental results demonstrate that the proposed method achieves a root mean square error (RMSE) of 0.2795 and a coefficient of determination (R<sup>2</sup>) of 0.9646 on the test set, significantly outperforming traditional methods in prediction accuracy and fitting capability.Furthermore, the robustness of the framework was successfully validated on heterogeneous matrices including SWRI Diesel, IDRC Tablets, and Soil, demonstrating robust generalizability across diverse liquid and solid physical states. Experimental results indicate that prioritizing high-quality feature selection over variable quantity significantly enhances model performance. The proposed systematic framework demonstrates robust analytical capabilities for high-dimensional spectral data across diverse and complex molecular systems.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"271 ","pages":"Article 105661"},"PeriodicalIF":3.8000,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemometrics and Intelligent Laboratory Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169743926000341","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/5 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

This study addresses the challenges of high-dimensional collinearity and regional information heterogeneity in near-infrared spectroscopy for gasoline olefin content prediction by proposing a systematic optimization approach combining a Continuous Region Utilizing Integrated Spectral Evaluation for Near-Infrared (CRUISE-NIR) algorithm with a Region-Sensitive Adaptive Ensemble Learning (RAEL) framework. The CRUISE-NIR algorithm shifts spectral analysis from a “point” to a “region” perspective, fully considering the physical correlation of adjacent wavelengths and chemical prior knowledge, reducing 4443 original variables to 16 key features. Meanwhile, the RAEL framework dynamically adjusts prediction weights according to sample performance characteristics in different spectral regions, achieving sample-specific precision prediction. Experimental results demonstrate that the proposed method achieves a root mean square error (RMSE) of 0.2795 and a coefficient of determination (R2) of 0.9646 on the test set, significantly outperforming traditional methods in prediction accuracy and fitting capability.Furthermore, the robustness of the framework was successfully validated on heterogeneous matrices including SWRI Diesel, IDRC Tablets, and Soil, demonstrating robust generalizability across diverse liquid and solid physical states. Experimental results indicate that prioritizing high-quality feature selection over variable quantity significantly enhances model performance. The proposed systematic framework demonstrates robust analytical capabilities for high-dimensional spectral data across diverse and complex molecular systems.
近红外光谱预测汽油烯烃含量:使用连续区域特征选择和区域敏感集合学习的系统方法
本研究针对近红外光谱预测汽油烯烃含量的高维共线性和区域信息异质性的挑战,提出了一种结合连续区域利用近红外综合光谱评估(CRUISE-NIR)算法和区域敏感自适应集成学习(RAEL)框架的系统优化方法。CRUISE-NIR算法将光谱分析从“点”的角度转移到“区域”的角度,充分考虑相邻波长的物理相关性和化学先验知识,将4443个原始变量减少到16个关键特征。同时,根据样本在不同光谱区域的性能特征动态调整预测权重,实现样本特定精度预测。实验结果表明,该方法在测试集上的均方根误差(RMSE)为0.2795,决定系数(R2)为0.9646,在预测精度和拟合能力上显著优于传统方法。此外,该框架的稳健性在包括SWRI Diesel、IDRC药片和土壤在内的异质基质上得到了成功验证,证明了该框架在不同液体和固体物理状态下的稳健性。实验结果表明,将高质量的特征选择优先于可变数量的特征选择可以显著提高模型的性能。提出的系统框架展示了跨不同和复杂分子系统的高维光谱数据的强大分析能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
7.70%
发文量
169
审稿时长
3.4 months
期刊介绍: Chemometrics and Intelligent Laboratory Systems publishes original research papers, short communications, reviews, tutorials and Original Software Publications reporting on development of novel statistical, mathematical, or computer techniques in Chemistry and related disciplines. Chemometrics is the chemical discipline that uses mathematical and statistical methods to design or select optimal procedures and experiments, and to provide maximum chemical information by analysing chemical data. The journal deals with the following topics: 1) Development of new statistical, mathematical and chemometrical methods for Chemistry and related fields (Environmental Chemistry, Biochemistry, Toxicology, System Biology, -Omics, etc.) 2) Novel applications of chemometrics to all branches of Chemistry and related fields (typical domains of interest are: process data analysis, experimental design, data mining, signal processing, supervised modelling, decision making, robust statistics, mixture analysis, multivariate calibration etc.) Routine applications of established chemometrical techniques will not be considered. 3) Development of new software that provides novel tools or truly advances the use of chemometrical methods. 4) Well characterized data sets to test performance for the new methods and software. The journal complies with International Committee of Medical Journal Editors'' Uniform requirements for manuscripts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书