Identification of geographical origin and adulteration of Northeast China soybeans by mid-infrared spectroscopy and spectra augmentation

IF 1.7 3区农林科学 Q4 FOOD SCIENCE & TECHNOLOGY

Journal of Consumer Protection and Food Safety Pub Date : 2023-12-19 DOI:10.1007/s00003-023-01471-8

Yuhui Xiao, Honghao Cai, Hui Ni

{"title":"Identification of geographical origin and adulteration of Northeast China soybeans by mid-infrared spectroscopy and spectra augmentation","authors":"Yuhui Xiao, Honghao Cai, Hui Ni","doi":"10.1007/s00003-023-01471-8","DOIUrl":null,"url":null,"abstract":"<div><p>Mathematical models based on infrared spectroscopy and machine learning have been successfully used to trace the origin of soybeans. However, as previous research reported, it is necessary to employ spectra data that undergo multiple pre-processing operations in order to achieve optimal accuracy during model training. And these established models are only capable of predicting samples with identical spectra pre-processing. Specifically, baseline correction, which necessitates individual processing of each spectrum, requiring substantial investments of time and human resources with a large dataset. In this study, the spectra augmentation technique was proposed based on the theory of data augmentation, in order to simplify or even eliminate the pre-processing steps for the prediction dataset. The technique utilized a combination of the standard spectra pre-processed data and the “boost data” to train models, specifically, a total of 180 spectra, including 90 pre-processed standard spectra and 90 “boost” spectra. The “boost” data refers to data without the standard spectra pre-processing. On the prediction dataset without the standard spectra pre-processing, the model with the spectra augmentation technique had an accuracy of 0.91 for the recognition of Northeast China soybeans, while the accuracy of the model with the training method frequently reported in previous studies only reached 0.71, demonstrating that the model trained with the proposed technique possessed higher robustness and generalization capabilities. The spectra augmentation technique can maintain high accuracy while simplifying spectra pre-processing steps on prediction data, therefore, providing a more efficient and expedited method for practical food traceability and authentication.</p><h3>Graphical abstract</h3><div><figure><div><div><picture><source><img></source></picture></div></div></figure></div></div>","PeriodicalId":622,"journal":{"name":"Journal of Consumer Protection and Food Safety","volume":"19 1","pages":"99 - 111"},"PeriodicalIF":1.7000,"publicationDate":"2023-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Consumer Protection and Food Safety","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s00003-023-01471-8","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Mathematical models based on infrared spectroscopy and machine learning have been successfully used to trace the origin of soybeans. However, as previous research reported, it is necessary to employ spectra data that undergo multiple pre-processing operations in order to achieve optimal accuracy during model training. And these established models are only capable of predicting samples with identical spectra pre-processing. Specifically, baseline correction, which necessitates individual processing of each spectrum, requiring substantial investments of time and human resources with a large dataset. In this study, the spectra augmentation technique was proposed based on the theory of data augmentation, in order to simplify or even eliminate the pre-processing steps for the prediction dataset. The technique utilized a combination of the standard spectra pre-processed data and the “boost data” to train models, specifically, a total of 180 spectra, including 90 pre-processed standard spectra and 90 “boost” spectra. The “boost” data refers to data without the standard spectra pre-processing. On the prediction dataset without the standard spectra pre-processing, the model with the spectra augmentation technique had an accuracy of 0.91 for the recognition of Northeast China soybeans, while the accuracy of the model with the training method frequently reported in previous studies only reached 0.71, demonstrating that the model trained with the proposed technique possessed higher robustness and generalization capabilities. The spectra augmentation technique can maintain high accuracy while simplifying spectra pre-processing steps on prediction data, therefore, providing a more efficient and expedited method for practical food traceability and authentication.

Graphical abstract

Abstract Image

查看原文本刊更多论文

利用中红外光谱和光谱增强技术鉴别中国东北大豆的地理原产地和掺假情况

基于红外光谱和机器学习的数学模型已被成功用于追溯大豆的原产地。然而，正如之前的研究报告所述，在模型训练过程中，有必要采用经过多重预处理操作的光谱数据，以达到最佳精度。而这些已建立的模型只能预测经过相同光谱预处理的样本。具体来说，基线校正需要对每个光谱进行单独处理，这就需要在庞大的数据集上投入大量的时间和人力资源。本研究根据数据增强理论提出了光谱增强技术，以简化甚至消除预测数据集的预处理步骤。该技术采用标准光谱预处理数据和 "增强数据 "相结合的方法来训练模型，具体来说，共使用 180 个光谱，包括 90 个预处理标准光谱和 90 个 "增强 "光谱。增强 "数据是指未经标准光谱预处理的数据。在未进行标准光谱预处理的预测数据集上，采用光谱增强技术的模型对东北大豆的识别准确率为 0.91，而采用以往研究中经常报道的训练方法的模型的准确率仅为 0.71，这表明采用所提出的技术训练的模型具有更高的鲁棒性和泛化能力。光谱增强技术既能保持较高的准确率，又能简化预测数据的光谱预处理步骤，因此为实际的食品溯源和鉴定提供了一种更高效、更快捷的方法。图文摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Consumer Protection and Food Safety Veterinary-Food Animals

CiteScore

3.70

自引率

4.20%

发文量

审稿时长

>12 weeks

期刊介绍： The JCF publishes peer-reviewed original Research Articles and Opinions that are of direct importance to Food and Feed Safety. This includes Food Packaging, Consumer Products as well as Plant Protection Products, Food Microbiology, Veterinary Drugs, Animal Welfare and Genetic Engineering. All peer-reviewed articles that are published should be devoted to improve Consumer Health Protection. Reviews and discussions are welcomed that address legal and/or regulatory decisions with respect to risk assessment and management of Food and Feed Safety issues on a scientific basis. It addresses an international readership of scientists, risk assessors and managers, and other professionals active in the field of Food and Feed Safety and Consumer Health Protection. Manuscripts – preferably written in English but also in German – are published as Research Articles, Reviews, Methods and Short Communications and should cover aspects including, but not limited to: · Factors influencing Food and Feed Safety · Factors influencing Consumer Health Protection · Factors influencing Consumer Behavior · Exposure science related to Risk Assessment and Risk Management · Regulatory aspects related to Food and Feed Safety, Food Packaging, Consumer Products, Plant Protection Products, Food Microbiology, Veterinary Drugs, Animal Welfare and Genetic Engineering · Analytical methods and method validation related to food control and food processing. The JCF also presents important News, as well as Announcements and Reports about administrative surveillance.