机器学习驱动的橄榄油质量预测：在多阶段验证中使用RF和XGBoost模型对FTMIR数据预处理技术的比较评估

IF 3.6

Measurement: Food Pub Date : 2025-09-01 DOI:10.1016/j.meafoo.2025.100249

Lahcen Hssaini

{"title":"机器学习驱动的橄榄油质量预测：在多阶段验证中使用RF和XGBoost模型对FTMIR数据预处理技术的比较评估","authors":"Lahcen Hssaini","doi":"10.1016/j.meafoo.2025.100249","DOIUrl":null,"url":null,"abstract":"<div><div>This study evaluates the impact of four mid-FTIR spectral preprocessing strategies—baseline correction, normalization, smoothing, first derivative transformation, all compared to raw data—on the performance of Random Forest (RF) and XGBoost (XGB) models for predicting key olive oil quality parameters mainly total phenolic content (TPC), total flavonoid content (TFC), DPPH radical scavenging activity, and carotenoid levels in the Picholine Marocaine cultivar. Using a dataset of 324 olive oil samples, models were trained and validated via a multi-stage framework (5-fold CV and 20 % external validation). Results revealed that smoothing significantly enhanced TPC prediction (XGB R² = 0.96, RMSE = 24.5 mg GAE/kg) while first derivative transformation optimized TFC prediction (R² = 0.93, RMSE = 18.2 mg QE/kg). Raw data sufficed for carotenoids (R² > 0.89). XGBoost consistently outperformed RF by 7–15 % across parameters due to its superior regularization capabilities. Notably, blind testing exposed a 25 % R² drop for DPPH with RF, underscoring the necessity of external validation. These findings support the development of rapid, non-destructive quality assessment tools with applications in industrial quality control, authentication systems, and regulatory compliance. Future research should explore hybrid preprocessing combinations, deep chemometric feature extraction, multi-cultivar validation, and seasonal model transferability to enhance robustness and commercial viability.</div></div>","PeriodicalId":100898,"journal":{"name":"Measurement: Food","volume":"19 ","pages":"Article 100249"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ML-driven olive oil quality prediction: Comparative evaluation of FTMIR data preprocessing techniques using RF and XGBoost models in multi-stage validation\",\"authors\":\"Lahcen Hssaini\",\"doi\":\"10.1016/j.meafoo.2025.100249\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study evaluates the impact of four mid-FTIR spectral preprocessing strategies—baseline correction, normalization, smoothing, first derivative transformation, all compared to raw data—on the performance of Random Forest (RF) and XGBoost (XGB) models for predicting key olive oil quality parameters mainly total phenolic content (TPC), total flavonoid content (TFC), DPPH radical scavenging activity, and carotenoid levels in the Picholine Marocaine cultivar. Using a dataset of 324 olive oil samples, models were trained and validated via a multi-stage framework (5-fold CV and 20 % external validation). Results revealed that smoothing significantly enhanced TPC prediction (XGB R² = 0.96, RMSE = 24.5 mg GAE/kg) while first derivative transformation optimized TFC prediction (R² = 0.93, RMSE = 18.2 mg QE/kg). Raw data sufficed for carotenoids (R² > 0.89). XGBoost consistently outperformed RF by 7–15 % across parameters due to its superior regularization capabilities. Notably, blind testing exposed a 25 % R² drop for DPPH with RF, underscoring the necessity of external validation. These findings support the development of rapid, non-destructive quality assessment tools with applications in industrial quality control, authentication systems, and regulatory compliance. Future research should explore hybrid preprocessing combinations, deep chemometric feature extraction, multi-cultivar validation, and seasonal model transferability to enhance robustness and commercial viability.</div></div>\",\"PeriodicalId\":100898,\"journal\":{\"name\":\"Measurement: Food\",\"volume\":\"19 \",\"pages\":\"Article 100249\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Measurement: Food\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S277227592500036X\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Measurement: Food","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S277227592500036X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本研究评估了四种中红外光谱预处理策略（基线校正、归一化、平滑、一阶导数变换）对随机森林（RF）和XGBoost （XGB）模型预测Picholine Marocaine品种橄榄油质量关键参数（主要是总酚含量（TPC）、总黄酮含量（TFC）、DPPH自由基清除活性和类胡萝卜素水平）性能的影响。使用324个橄榄油样本的数据集，通过多阶段框架（5倍CV和20%外部验证）对模型进行训练和验证。结果表明，平滑法显著增强了TPC预测（XGB R²= 0.96,RMSE = 24.5 mg GAE/kg），一阶导数变换优化了TFC预测（R²= 0.93,RMSE = 18.2 mg QE/kg）。类胡萝卜素的原始数据足够（R²> 0.89）。由于其优越的正则化能力，XGBoost在各参数上的表现始终优于RF 7 - 15%。值得注意的是，盲测暴露了RF对DPPH的25% R²下降，强调了外部验证的必要性。这些发现支持在工业质量控制、认证系统和法规遵从中应用快速、非破坏性质量评估工具的开发。未来的研究应探索混合预处理组合、深度化学计量特征提取、多品种验证和季节性模型可移植性，以增强鲁棒性和商业可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ML-driven olive oil quality prediction: Comparative evaluation of FTMIR data preprocessing techniques using RF and XGBoost models in multi-stage validation

This study evaluates the impact of four mid-FTIR spectral preprocessing strategies—baseline correction, normalization, smoothing, first derivative transformation, all compared to raw data—on the performance of Random Forest (RF) and XGBoost (XGB) models for predicting key olive oil quality parameters mainly total phenolic content (TPC), total flavonoid content (TFC), DPPH radical scavenging activity, and carotenoid levels in the Picholine Marocaine cultivar. Using a dataset of 324 olive oil samples, models were trained and validated via a multi-stage framework (5-fold CV and 20 % external validation). Results revealed that smoothing significantly enhanced TPC prediction (XGB R² = 0.96, RMSE = 24.5 mg GAE/kg) while first derivative transformation optimized TFC prediction (R² = 0.93, RMSE = 18.2 mg QE/kg). Raw data sufficed for carotenoids (R² > 0.89). XGBoost consistently outperformed RF by 7–15 % across parameters due to its superior regularization capabilities. Notably, blind testing exposed a 25 % R² drop for DPPH with RF, underscoring the necessity of external validation. These findings support the development of rapid, non-destructive quality assessment tools with applications in industrial quality control, authentication systems, and regulatory compliance. Future research should explore hybrid preprocessing combinations, deep chemometric feature extraction, multi-cultivar validation, and seasonal model transferability to enhance robustness and commercial viability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Measurement: Food

CiteScore

3.10

自引率

0.00%

发文量