Hyeona Lim, Seon Yeong Lee, Jin Young Kim, Yeon Ju Shin, Yerin Jang, Hyeonjin Kim, Byung Hee Kim, Sangdoo Ahn
{"title":"Comparison of machine learning models for classifying edible oils using Fourier-transform infrared spectroscopy","authors":"Hyeona Lim, Seon Yeong Lee, Jin Young Kim, Yeon Ju Shin, Yerin Jang, Hyeonjin Kim, Byung Hee Kim, Sangdoo Ahn","doi":"10.1002/bkcs.12932","DOIUrl":null,"url":null,"abstract":"<p>Accurate classification and authentication of edible oils are essential for maintaining product quality, ensuring consumer safety, and preserving market integrity. Therefore, this study aims to propose Fourier-transform infrared (FT-IR) spectroscopy, combined with advanced machine learning models, as a rapid and non-destructive technique for classifying edible oils. The FT-IR spectra of seven edible oil types were analyzed across three spectral regions: the full range, the C-H stretching range, and the fingerprint region. Both absorbance and second derivative spectra were used to evaluate the influence of spectral preprocessing on classification accuracy. Six machine learning models—principal component analysis followed by linear discriminant analysis (PCA-LDA), k-nearest neighbors, decision tree, random forest, eXtreme Gradient Boosting, and support vector machines (SVM)—were employed to classify the oils, achieving training accuracies of 96.4%–100% and testing accuracies of 88.1%–100%. The second derivative spectra enhanced model performance by improving the resolution of overlapping peaks, particularly in the C<span></span>H and C<span></span>O stretching regions. Additionally, the SHapley Additive exPlanations analysis further revealed the most critical spectral features influencing model predictions, offering valuable insights into the decision-making processes. This study demonstrates the effectiveness of combining FT-IR spectroscopy, second derivative preprocessing, and machine learning techniques for classifying edible oils. The findings highlight the benefits of second derivative spectra in enhancing spectral resolution and the superior classification performance of PCA-LDA and SVM models. These results offer a robust framework for advancing edible oil analysis and emphasize the potential of artificial intelligence in food authentication and quality control.</p>","PeriodicalId":54252,"journal":{"name":"Bulletin of the Korean Chemical Society","volume":"46 2","pages":"131-137"},"PeriodicalIF":1.7000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of the Korean Chemical Society","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bkcs.12932","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate classification and authentication of edible oils are essential for maintaining product quality, ensuring consumer safety, and preserving market integrity. Therefore, this study aims to propose Fourier-transform infrared (FT-IR) spectroscopy, combined with advanced machine learning models, as a rapid and non-destructive technique for classifying edible oils. The FT-IR spectra of seven edible oil types were analyzed across three spectral regions: the full range, the C-H stretching range, and the fingerprint region. Both absorbance and second derivative spectra were used to evaluate the influence of spectral preprocessing on classification accuracy. Six machine learning models—principal component analysis followed by linear discriminant analysis (PCA-LDA), k-nearest neighbors, decision tree, random forest, eXtreme Gradient Boosting, and support vector machines (SVM)—were employed to classify the oils, achieving training accuracies of 96.4%–100% and testing accuracies of 88.1%–100%. The second derivative spectra enhanced model performance by improving the resolution of overlapping peaks, particularly in the CH and CO stretching regions. Additionally, the SHapley Additive exPlanations analysis further revealed the most critical spectral features influencing model predictions, offering valuable insights into the decision-making processes. This study demonstrates the effectiveness of combining FT-IR spectroscopy, second derivative preprocessing, and machine learning techniques for classifying edible oils. The findings highlight the benefits of second derivative spectra in enhancing spectral resolution and the superior classification performance of PCA-LDA and SVM models. These results offer a robust framework for advancing edible oil analysis and emphasize the potential of artificial intelligence in food authentication and quality control.
期刊介绍:
The Bulletin of the Korean Chemical Society is an official research journal of the Korean Chemical Society. It was founded in 1980 and reaches out to the chemical community worldwide. It is strictly peer-reviewed and welcomes Accounts, Communications, Articles, and Notes written in English. The scope of the journal covers all major areas of chemistry: analytical chemistry, electrochemistry, industrial chemistry, inorganic chemistry, life-science chemistry, macromolecular chemistry, organic synthesis, non-synthetic organic chemistry, physical chemistry, and materials chemistry.