Adulteration detection in cactus seed oil: Integrating analytical chemistry and machine learning approaches

IF 7 2区农林科学 Q1 FOOD SCIENCE & TECHNOLOGY

Current Research in Food Science Pub Date : 2025-01-01 DOI:10.1016/j.crfs.2025.100986

Said El Harkaoui , Cristina Ortiz Cruz , Aaron Roggenland , Micha Schneider , Sascha Rohn , Stephan Drusch , Bertrand Matthäus

{"title":"Adulteration detection in cactus seed oil: Integrating analytical chemistry and machine learning approaches","authors":"Said El Harkaoui , Cristina Ortiz Cruz , Aaron Roggenland , Micha Schneider , Sascha Rohn , Stephan Drusch , Bertrand Matthäus","doi":"10.1016/j.crfs.2025.100986","DOIUrl":null,"url":null,"abstract":"<div><div>Economically motivated adulteration threatens both consumer rights and market integrity, particularly with high-value cold-pressed oils like cactus seed oil (CO). This study proposes a machine learning model that integrates analytical measurements, data simulations, and classification techniques to detect adulteration of CO with refined sunflower oil (SO) and determine the detectable limit of adulteration without measuring a huge number of different mixtures. First, pure CO and SO samples were analyzed for their fatty acid, triacylglycerol, and tocochromanol content using HPLC or GC. The resulting oil composition data served as the foundation for further simulations. Monte Carlo (MC) simulations outperformed Conditional Tabular Generative Adversarial Networks (CTGAN) in simulating realistic oil compositions, with MC yielding lower Kullback-Leibler Divergence values compared to CTGAN. The MC-simulated data were then used to simulate larger datasets, a critical step for training and testing two classification models: Random Forest (RF) and Neural Networks (NN), as robust training cannot be achieved with small sample sizes. Both models achieved good classification accuracies, with RF achieving higher accuracy than NN, reaching 94% on simulated datasets and 90% on real-world samples with detectable adulteration levels as low as 1%. RF also offers better interpretability and is computational less demanding as compared to NN which makes it advantageous for authenticity verification in this study. Therefore, combining MC simulation with RF as a robust method for detecting CO adulteration is proposed. The proposed method, coded in Python and available as open-source, offers a flexible framework for continuous adaptation with new data.</div></div>","PeriodicalId":10939,"journal":{"name":"Current Research in Food Science","volume":"10 ","pages":"Article 100986"},"PeriodicalIF":7.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Research in Food Science","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2665927125000176","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Economically motivated adulteration threatens both consumer rights and market integrity, particularly with high-value cold-pressed oils like cactus seed oil (CO). This study proposes a machine learning model that integrates analytical measurements, data simulations, and classification techniques to detect adulteration of CO with refined sunflower oil (SO) and determine the detectable limit of adulteration without measuring a huge number of different mixtures. First, pure CO and SO samples were analyzed for their fatty acid, triacylglycerol, and tocochromanol content using HPLC or GC. The resulting oil composition data served as the foundation for further simulations. Monte Carlo (MC) simulations outperformed Conditional Tabular Generative Adversarial Networks (CTGAN) in simulating realistic oil compositions, with MC yielding lower Kullback-Leibler Divergence values compared to CTGAN. The MC-simulated data were then used to simulate larger datasets, a critical step for training and testing two classification models: Random Forest (RF) and Neural Networks (NN), as robust training cannot be achieved with small sample sizes. Both models achieved good classification accuracies, with RF achieving higher accuracy than NN, reaching 94% on simulated datasets and 90% on real-world samples with detectable adulteration levels as low as 1%. RF also offers better interpretability and is computational less demanding as compared to NN which makes it advantageous for authenticity verification in this study. Therefore, combining MC simulation with RF as a robust method for detecting CO adulteration is proposed. The proposed method, coded in Python and available as open-source, offers a flexible framework for continuous adaptation with new data.

Abstract Image

查看原文本刊更多论文

仙人掌籽油的掺假检测：整合分析化学和机器学习方法

出于经济动机的掺假威胁到消费者的权利和市场的完整性，特别是高价值的冷榨油，如仙人掌籽油（CO）。本研究提出了一种机器学习模型，该模型集成了分析测量、数据模拟和分类技术，用于检测CO与精炼葵花籽油（SO）的掺假，并确定掺假的可检测极限，而无需测量大量不同的混合物。首先，用HPLC或GC分析纯CO和SO样品的脂肪酸、甘油三酯和甲苯酚含量。由此得出的石油成分数据为进一步的模拟奠定了基础。蒙特卡罗（MC）模拟在模拟真实石油成分方面优于条件表生成对抗网络（CTGAN），与CTGAN相比，MC产生的Kullback-Leibler散度值更低。mc模拟的数据随后被用于模拟更大的数据集，这是训练和测试两种分类模型的关键步骤：随机森林（RF）和神经网络（NN），因为小样本量无法实现鲁棒训练。两种模型都获得了良好的分类精度，RF的准确率高于NN，在模拟数据集上达到94%，在可检测掺假水平低至1%的真实样本上达到90%。与神经网络相比，射频还提供了更好的可解释性，并且计算需求更少，这使得它有利于本研究中的真实性验证。因此，提出了将MC模拟与RF相结合作为检测CO掺假的鲁棒方法。所提出的方法是用Python编写的，并且是开源的，它为不断适应新数据提供了一个灵活的框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Current Research in Food Science Agricultural and Biological Sciences-Food Science

CiteScore

7.40

自引率

3.20%

发文量

232

审稿时长

84 days

期刊介绍： Current Research in Food Science is an international peer-reviewed journal dedicated to advancing the breadth of knowledge in the field of food science. It serves as a platform for publishing original research articles and short communications that encompass a wide array of topics, including food chemistry, physics, microbiology, nutrition, nutraceuticals, process and package engineering, materials science, food sustainability, and food security. By covering these diverse areas, the journal aims to provide a comprehensive source of the latest scientific findings and technological advancements that are shaping the future of the food industry. The journal's scope is designed to address the multidisciplinary nature of food science, reflecting its commitment to promoting innovation and ensuring the safety and quality of the food supply.