{"title":"Demystifying food flavor: Flavor data interpretation through machine learning","authors":"Huabin Luo, Simen Akkermans, Jan F.M. Van Impe","doi":"10.1016/j.foodchem.2025.144000","DOIUrl":null,"url":null,"abstract":"<div><div>Flavor data obtained from analytical techniques are vast and complex, which increases the difficulty of multi-factorial analysis. This study aims to provide a machine learning (ML)-based framework to interpret flavor data, exploiting four widely used techniques, i.e., Principal Component Analysis (PCA), Redundancy Analysis (RDA), Partial Least Squares (PLS), and Random Forest (RF). To demonstrate the potential of these ML techniques, two case studies, one with semi-quantitative data and the other with quantitative data, were discussed. Results indicate that PCA is useful for data exploration; RDA can quantify the statistical significance of factors; combining feature importance analysis results from PLS and RF offer a comprehensive understanding of marker compounds. Regarding classification performance, PLS excels in handling collinear data, whereas RF captures complex patterns if sufficient data are available. However, overfitting is a risk for datasets with small sample sizes. Overall, carefully selecting and integrating those ML techniques could demystify food flavor.</div></div>","PeriodicalId":318,"journal":{"name":"Food Chemistry","volume":"483 ","pages":"Article 144000"},"PeriodicalIF":9.8000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Food Chemistry","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0308814625012518","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
Flavor data obtained from analytical techniques are vast and complex, which increases the difficulty of multi-factorial analysis. This study aims to provide a machine learning (ML)-based framework to interpret flavor data, exploiting four widely used techniques, i.e., Principal Component Analysis (PCA), Redundancy Analysis (RDA), Partial Least Squares (PLS), and Random Forest (RF). To demonstrate the potential of these ML techniques, two case studies, one with semi-quantitative data and the other with quantitative data, were discussed. Results indicate that PCA is useful for data exploration; RDA can quantify the statistical significance of factors; combining feature importance analysis results from PLS and RF offer a comprehensive understanding of marker compounds. Regarding classification performance, PLS excels in handling collinear data, whereas RF captures complex patterns if sufficient data are available. However, overfitting is a risk for datasets with small sample sizes. Overall, carefully selecting and integrating those ML techniques could demystify food flavor.
期刊介绍:
Food Chemistry publishes original research papers dealing with the advancement of the chemistry and biochemistry of foods or the analytical methods/ approach used. All papers should focus on the novelty of the research carried out.