机器学习驱动的橄榄油质量预测:在多阶段验证中使用RF和XGBoost模型对FTMIR数据预处理技术的比较评估

IF 3.6
Lahcen Hssaini
{"title":"机器学习驱动的橄榄油质量预测:在多阶段验证中使用RF和XGBoost模型对FTMIR数据预处理技术的比较评估","authors":"Lahcen Hssaini","doi":"10.1016/j.meafoo.2025.100249","DOIUrl":null,"url":null,"abstract":"<div><div>This study evaluates the impact of four mid-FTIR spectral preprocessing strategies—baseline correction, normalization, smoothing, first derivative transformation, all compared to raw data—on the performance of Random Forest (RF) and XGBoost (XGB) models for predicting key olive oil quality parameters mainly total phenolic content (TPC), total flavonoid content (TFC), DPPH radical scavenging activity, and carotenoid levels in the Picholine Marocaine cultivar. Using a dataset of 324 olive oil samples, models were trained and validated via a multi-stage framework (5-fold CV and 20 % external validation). Results revealed that smoothing significantly enhanced TPC prediction (XGB R² = 0.96, RMSE = 24.5 mg GAE/kg) while first derivative transformation optimized TFC prediction (R² = 0.93, RMSE = 18.2 mg QE/kg). Raw data sufficed for carotenoids (R² &gt; 0.89). XGBoost consistently outperformed RF by 7–15 % across parameters due to its superior regularization capabilities. Notably, blind testing exposed a 25 % R² drop for DPPH with RF, underscoring the necessity of external validation. These findings support the development of rapid, non-destructive quality assessment tools with applications in industrial quality control, authentication systems, and regulatory compliance. Future research should explore hybrid preprocessing combinations, deep chemometric feature extraction, multi-cultivar validation, and seasonal model transferability to enhance robustness and commercial viability.</div></div>","PeriodicalId":100898,"journal":{"name":"Measurement: Food","volume":"19 ","pages":"Article 100249"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ML-driven olive oil quality prediction: Comparative evaluation of FTMIR data preprocessing techniques using RF and XGBoost models in multi-stage validation\",\"authors\":\"Lahcen Hssaini\",\"doi\":\"10.1016/j.meafoo.2025.100249\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study evaluates the impact of four mid-FTIR spectral preprocessing strategies—baseline correction, normalization, smoothing, first derivative transformation, all compared to raw data—on the performance of Random Forest (RF) and XGBoost (XGB) models for predicting key olive oil quality parameters mainly total phenolic content (TPC), total flavonoid content (TFC), DPPH radical scavenging activity, and carotenoid levels in the Picholine Marocaine cultivar. Using a dataset of 324 olive oil samples, models were trained and validated via a multi-stage framework (5-fold CV and 20 % external validation). Results revealed that smoothing significantly enhanced TPC prediction (XGB R² = 0.96, RMSE = 24.5 mg GAE/kg) while first derivative transformation optimized TFC prediction (R² = 0.93, RMSE = 18.2 mg QE/kg). Raw data sufficed for carotenoids (R² &gt; 0.89). XGBoost consistently outperformed RF by 7–15 % across parameters due to its superior regularization capabilities. Notably, blind testing exposed a 25 % R² drop for DPPH with RF, underscoring the necessity of external validation. These findings support the development of rapid, non-destructive quality assessment tools with applications in industrial quality control, authentication systems, and regulatory compliance. Future research should explore hybrid preprocessing combinations, deep chemometric feature extraction, multi-cultivar validation, and seasonal model transferability to enhance robustness and commercial viability.</div></div>\",\"PeriodicalId\":100898,\"journal\":{\"name\":\"Measurement: Food\",\"volume\":\"19 \",\"pages\":\"Article 100249\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement: Food\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S277227592500036X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement: Food","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S277227592500036X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究评估了四种中红外光谱预处理策略(基线校正、归一化、平滑、一阶导数变换)对随机森林(RF)和XGBoost (XGB)模型预测Picholine Marocaine品种橄榄油质量关键参数(主要是总酚含量(TPC)、总黄酮含量(TFC)、DPPH自由基清除活性和类胡萝卜素水平)性能的影响。使用324个橄榄油样本的数据集,通过多阶段框架(5倍CV和20%外部验证)对模型进行训练和验证。结果表明,平滑法显著增强了TPC预测(XGB R²= 0.96,RMSE = 24.5 mg GAE/kg),一阶导数变换优化了TFC预测(R²= 0.93,RMSE = 18.2 mg QE/kg)。类胡萝卜素的原始数据足够(R²> 0.89)。由于其优越的正则化能力,XGBoost在各参数上的表现始终优于RF 7 - 15%。值得注意的是,盲测暴露了RF对DPPH的25% R²下降,强调了外部验证的必要性。这些发现支持在工业质量控制、认证系统和法规遵从中应用快速、非破坏性质量评估工具的开发。未来的研究应探索混合预处理组合、深度化学计量特征提取、多品种验证和季节性模型可移植性,以增强鲁棒性和商业可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ML-driven olive oil quality prediction: Comparative evaluation of FTMIR data preprocessing techniques using RF and XGBoost models in multi-stage validation
This study evaluates the impact of four mid-FTIR spectral preprocessing strategies—baseline correction, normalization, smoothing, first derivative transformation, all compared to raw data—on the performance of Random Forest (RF) and XGBoost (XGB) models for predicting key olive oil quality parameters mainly total phenolic content (TPC), total flavonoid content (TFC), DPPH radical scavenging activity, and carotenoid levels in the Picholine Marocaine cultivar. Using a dataset of 324 olive oil samples, models were trained and validated via a multi-stage framework (5-fold CV and 20 % external validation). Results revealed that smoothing significantly enhanced TPC prediction (XGB R² = 0.96, RMSE = 24.5 mg GAE/kg) while first derivative transformation optimized TFC prediction (R² = 0.93, RMSE = 18.2 mg QE/kg). Raw data sufficed for carotenoids (R² > 0.89). XGBoost consistently outperformed RF by 7–15 % across parameters due to its superior regularization capabilities. Notably, blind testing exposed a 25 % R² drop for DPPH with RF, underscoring the necessity of external validation. These findings support the development of rapid, non-destructive quality assessment tools with applications in industrial quality control, authentication systems, and regulatory compliance. Future research should explore hybrid preprocessing combinations, deep chemometric feature extraction, multi-cultivar validation, and seasonal model transferability to enhance robustness and commercial viability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
3.10
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信