将近红外高光谱成像与机器学习和特征选择相结合：检测特级初榨橄榄油与低级橄榄油和榛子油的掺假情况

IF 7 2区农林科学 Q1 FOOD SCIENCE & TECHNOLOGY

Current Research in Food Science Pub Date : 2024-01-01 DOI:10.1016/j.crfs.2024.100913

Derick Malavi , Katleen Raes , Sam Van Haute

{"title":"将近红外高光谱成像与机器学习和特征选择相结合：检测特级初榨橄榄油与低级橄榄油和榛子油的掺假情况","authors":"Derick Malavi , Katleen Raes , Sam Van Haute","doi":"10.1016/j.crfs.2024.100913","DOIUrl":null,"url":null,"abstract":"<div><div>Detecting adulteration in extra virgin olive oil (EVOO) is particularly challenging with oils of similar chemical composition. This study applies near-infrared hyperspectral imaging (NIR-HSI) and machine learning (ML) to detect EVOO adulteration with hazelnut, refined olive, and olive pomace oils at various concentrations (1%, 5%, 10%, 20%, 40%, and 100% m/m). Savitzky-Golay filtering, first and second derivatives, multiplicative scatter correction (MSC), standard normal variate (SNV), and their combinations were used to preprocess the spectral data, with Principal Component Analysis (PCA) reducing dimensionality. Classification was performed using Partial Least Squares-Discriminant Analysis (PLS-DA) and ML algorithms, including k-Nearest Neighbors (k-NN), Naïve Bayes, Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Networks (ANN). PLS-DA, k-NN, RF, SVM, NB, and ANN models achieved accuracy rates of 97.0–99.0%, 96.2–100%, 96.5–100%, 98.6–99.5%, 93.9–99.7%, and 99.2–100%, respectively, in discriminating between pure EVOO, adulterants, and adulterated oils. PLS-DA, RF, SVM, and ANN significantly outperformed Naïve Bayes (p < 0.05) in binary classification, with Matthews correlation coefficient (MCC) values exceeding 0.90. All the binary classifiers except Naïve Bayes, when coupled with SNV/MSC, Savitzky-Golay smoothing and derivatives, consistently achieved perfect scores (1.0) for accuracy, sensitivity, specificity, F1 score, precision, and MCC in distinguishing pure EVOO from adulterated oils. No significant differences (p > 0.05) in model performance were found between those using full spectra and those based on key variable selection. However, PLS-DA and ANN significantly outperformed k-NN, RF, and SVM (p < 0.05), with MCC values ranging from 0.95 to 1.00, indicating superior classification performance. These findings demonstrate that combining NIR-HSI with machine learning, along with key variable selection, potentially offers an effective, non-destructive solution for detecting adulteration in EVOO and combating fraud in the olive oil industry.</div></div>","PeriodicalId":10939,"journal":{"name":"Current Research in Food Science","volume":"9 ","pages":"Article 100913"},"PeriodicalIF":7.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrating near-infrared hyperspectral imaging with machine learning and feature selection: Detecting adulteration of extra-virgin olive oil with lower-grade olive oils and hazelnut oil\",\"authors\":\"Derick Malavi , Katleen Raes , Sam Van Haute\",\"doi\":\"10.1016/j.crfs.2024.100913\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Detecting adulteration in extra virgin olive oil (EVOO) is particularly challenging with oils of similar chemical composition. This study applies near-infrared hyperspectral imaging (NIR-HSI) and machine learning (ML) to detect EVOO adulteration with hazelnut, refined olive, and olive pomace oils at various concentrations (1%, 5%, 10%, 20%, 40%, and 100% m/m). Savitzky-Golay filtering, first and second derivatives, multiplicative scatter correction (MSC), standard normal variate (SNV), and their combinations were used to preprocess the spectral data, with Principal Component Analysis (PCA) reducing dimensionality. Classification was performed using Partial Least Squares-Discriminant Analysis (PLS-DA) and ML algorithms, including k-Nearest Neighbors (k-NN), Naïve Bayes, Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Networks (ANN). PLS-DA, k-NN, RF, SVM, NB, and ANN models achieved accuracy rates of 97.0–99.0%, 96.2–100%, 96.5–100%, 98.6–99.5%, 93.9–99.7%, and 99.2–100%, respectively, in discriminating between pure EVOO, adulterants, and adulterated oils. PLS-DA, RF, SVM, and ANN significantly outperformed Naïve Bayes (p < 0.05) in binary classification, with Matthews correlation coefficient (MCC) values exceeding 0.90. All the binary classifiers except Naïve Bayes, when coupled with SNV/MSC, Savitzky-Golay smoothing and derivatives, consistently achieved perfect scores (1.0) for accuracy, sensitivity, specificity, F1 score, precision, and MCC in distinguishing pure EVOO from adulterated oils. No significant differences (p > 0.05) in model performance were found between those using full spectra and those based on key variable selection. However, PLS-DA and ANN significantly outperformed k-NN, RF, and SVM (p < 0.05), with MCC values ranging from 0.95 to 1.00, indicating superior classification performance. These findings demonstrate that combining NIR-HSI with machine learning, along with key variable selection, potentially offers an effective, non-destructive solution for detecting adulteration in EVOO and combating fraud in the olive oil industry.</div></div>\",\"PeriodicalId\":10939,\"journal\":{\"name\":\"Current Research in Food Science\",\"volume\":\"9 \",\"pages\":\"Article 100913\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Current Research in Food Science\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2665927124002399\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"FOOD SCIENCE & TECHNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Research in Food Science","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2665927124002399","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

检测特级初榨橄榄油（EVOO）中的掺假尤其具有挑战性，因为橄榄油的化学成分相似。本研究采用近红外高光谱成像（NIR-HSI）和机器学习（ML）技术检测不同浓度（1%、5%、10%、20%、40%和100% m/m）的榛子油、精炼橄榄油和橄榄渣油中的特级初榨橄榄油掺假情况。使用萨维茨基-戈莱滤波、一导和二导、乘法散度校正（MSC）、标准正态变异（SNV）及其组合对光谱数据进行预处理，并使用主成分分析（PCA）降低维度。使用偏最小二乘法-判别分析（PLS-DA）和 ML 算法（包括 k-近邻（k-NN）、奈夫贝叶斯（Naïve Bayes）、随机森林（RF）、支持向量机（SVM）和人工神经网络（ANN））进行分类。PLS-DA、k-NN、RF、SVM、NB 和 ANN 模型在区分纯 EVOO、掺假油和掺假油方面的准确率分别为 97.0-99.0%、96.2-100%、96.5-100%、98.6-99.5%、93.9-99.7% 和 99.2-100%。在二元分类中，PLS-DA、RF、SVM 和 ANN 的表现明显优于 Naïve Bayes（p < 0.05），马修斯相关系数 (MCC) 值超过 0.90。除 Naïve Bayes 外，所有二元分类器在与 SNV/MSC、Savitzky-Golay 平滑化和衍生物结合使用时，在区分纯 EVOO 和掺假油的准确度、灵敏度、特异性、F1 分数、精确度和 MCC 方面均达到满分（1.0）。使用全光谱的模型和基于关键变量选择的模型在性能上没有发现明显差异（p > 0.05）。但是，PLS-DA 和 ANN 的性能明显优于 k-NN、RF 和 SVM（p < 0.05），MCC 值在 0.95 到 1.00 之间，表明分类性能优越。这些研究结果表明，将近红外-HSI 与机器学习相结合，再加上关键变量的选择，有可能为检测橄榄油中的掺假和打击橄榄油行业中的欺诈行为提供一种有效的、非破坏性的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Integrating near-infrared hyperspectral imaging with machine learning and feature selection: Detecting adulteration of extra-virgin olive oil with lower-grade olive oils and hazelnut oil

查看原文本刊更多论文

Integrating near-infrared hyperspectral imaging with machine learning and feature selection: Detecting adulteration of extra-virgin olive oil with lower-grade olive oils and hazelnut oil

Detecting adulteration in extra virgin olive oil (EVOO) is particularly challenging with oils of similar chemical composition. This study applies near-infrared hyperspectral imaging (NIR-HSI) and machine learning (ML) to detect EVOO adulteration with hazelnut, refined olive, and olive pomace oils at various concentrations (1%, 5%, 10%, 20%, 40%, and 100% m/m). Savitzky-Golay filtering, first and second derivatives, multiplicative scatter correction (MSC), standard normal variate (SNV), and their combinations were used to preprocess the spectral data, with Principal Component Analysis (PCA) reducing dimensionality. Classification was performed using Partial Least Squares-Discriminant Analysis (PLS-DA) and ML algorithms, including k-Nearest Neighbors (k-NN), Naïve Bayes, Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Networks (ANN). PLS-DA, k-NN, RF, SVM, NB, and ANN models achieved accuracy rates of 97.0–99.0%, 96.2–100%, 96.5–100%, 98.6–99.5%, 93.9–99.7%, and 99.2–100%, respectively, in discriminating between pure EVOO, adulterants, and adulterated oils. PLS-DA, RF, SVM, and ANN significantly outperformed Naïve Bayes (p < 0.05) in binary classification, with Matthews correlation coefficient (MCC) values exceeding 0.90. All the binary classifiers except Naïve Bayes, when coupled with SNV/MSC, Savitzky-Golay smoothing and derivatives, consistently achieved perfect scores (1.0) for accuracy, sensitivity, specificity, F1 score, precision, and MCC in distinguishing pure EVOO from adulterated oils. No significant differences (p > 0.05) in model performance were found between those using full spectra and those based on key variable selection. However, PLS-DA and ANN significantly outperformed k-NN, RF, and SVM (p < 0.05), with MCC values ranging from 0.95 to 1.00, indicating superior classification performance. These findings demonstrate that combining NIR-HSI with machine learning, along with key variable selection, potentially offers an effective, non-destructive solution for detecting adulteration in EVOO and combating fraud in the olive oil industry.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Current Research in Food Science Agricultural and Biological Sciences-Food Science

CiteScore

7.40

自引率

3.20%

发文量

232

审稿时长

84 days

期刊介绍： Current Research in Food Science is an international peer-reviewed journal dedicated to advancing the breadth of knowledge in the field of food science. It serves as a platform for publishing original research articles and short communications that encompass a wide array of topics, including food chemistry, physics, microbiology, nutrition, nutraceuticals, process and package engineering, materials science, food sustainability, and food security. By covering these diverse areas, the journal aims to provide a comprehensive source of the latest scientific findings and technological advancements that are shaping the future of the food industry. The journal's scope is designed to address the multidisciplinary nature of food science, reflecting its commitment to promoting innovation and ensuring the safety and quality of the food supply.