María-Pilar Sáenz-Navajas , Chelo Ferreira , Susan E.P. Bastian , David W. Jeffery
{"title":"Bagging and boosting machine learning algorithms for modelling sensory perception from simple chemical variables: Wine mouthfeel as a case study","authors":"María-Pilar Sáenz-Navajas , Chelo Ferreira , Susan E.P. Bastian , David W. Jeffery","doi":"10.1016/j.foodqual.2025.105494","DOIUrl":null,"url":null,"abstract":"<div><div>Aiming to predict sensory properties from chemical data, the application of bagging and boosting machine learning (ML) algorithms was comprehensively investigated and applied to modelling of red wine mouthfeel from simple chemical measurements. A panel of 15 Australian winemakers described the mouthfeel properties of a total of 30 commercial red wines from Australia and Spain using rate-all-that-apply sensory methodology. In parallel, linear sweep voltammetry signals and excitation-emission matrix (EEM) and absorbance data were acquired for the wines. Data were analysed following unsupervised statistical strategies including principal component analysis (PCA with varimax rotation) to simplify the interpretation of sensory variables, along with supervised regression models based on ML, namely random forest (RF) and extreme gradient boosting (XGBoost). PCA results showed that four independent and uncorrelated sensory dimensions mainly related to perceptions of ‘drying’, ‘full body’, ‘velvety’, and ‘gummy’ differentiated among the wines. The RF and XGBoost algorithms yielded superior validated regression models compared to classical PLS modelling. The ML algorithms exhibited strong predictive performance on test data, with an average value exceeding 80 % accuracy for any of the three sets of chemical variables employed. Although XGBoost provided slightly better models, the low computational effort required by RF is advantageous. Key variables included in the models are discussed along with the importance of controlling overfitting. Overall, absorbance, voltammetric or EEM signals coupled with RF or XGBoost algorithms are presented as cheap, easy-to-use, and rapid approaches to predicting sensory properties from chemical signals in complex matrices such as wine.</div></div>","PeriodicalId":322,"journal":{"name":"Food Quality and Preference","volume":"129 ","pages":"Article 105494"},"PeriodicalIF":4.9000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Food Quality and Preference","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950329325000692","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Aiming to predict sensory properties from chemical data, the application of bagging and boosting machine learning (ML) algorithms was comprehensively investigated and applied to modelling of red wine mouthfeel from simple chemical measurements. A panel of 15 Australian winemakers described the mouthfeel properties of a total of 30 commercial red wines from Australia and Spain using rate-all-that-apply sensory methodology. In parallel, linear sweep voltammetry signals and excitation-emission matrix (EEM) and absorbance data were acquired for the wines. Data were analysed following unsupervised statistical strategies including principal component analysis (PCA with varimax rotation) to simplify the interpretation of sensory variables, along with supervised regression models based on ML, namely random forest (RF) and extreme gradient boosting (XGBoost). PCA results showed that four independent and uncorrelated sensory dimensions mainly related to perceptions of ‘drying’, ‘full body’, ‘velvety’, and ‘gummy’ differentiated among the wines. The RF and XGBoost algorithms yielded superior validated regression models compared to classical PLS modelling. The ML algorithms exhibited strong predictive performance on test data, with an average value exceeding 80 % accuracy for any of the three sets of chemical variables employed. Although XGBoost provided slightly better models, the low computational effort required by RF is advantageous. Key variables included in the models are discussed along with the importance of controlling overfitting. Overall, absorbance, voltammetric or EEM signals coupled with RF or XGBoost algorithms are presented as cheap, easy-to-use, and rapid approaches to predicting sensory properties from chemical signals in complex matrices such as wine.
为了从化学数据中预测感官特性,全面研究了袋装和增强机器学习(ML)算法的应用,并将其应用于从简单的化学测量中建立红酒口感模型。一个由15位澳大利亚酿酒师组成的评估团,用“速率全部适用”(rate-all- thatapply)的感官方法,对来自澳大利亚和西班牙的总共30种商业红葡萄酒的口感进行了评价。同时,获得了葡萄酒的线性扫描伏安信号、激发发射矩阵(EEM)和吸光度数据。数据分析采用无监督统计策略,包括主成分分析(PCA with varimax rotation),以简化对感官变量的解释,以及基于ML的监督回归模型,即随机森林(RF)和极端梯度增强(XGBoost)。PCA结果显示,四个独立且不相关的感官维度主要与葡萄酒的“干燥”、“饱满”、“柔软”和“粘稠”的感知相关。与经典PLS模型相比,RF和XGBoost算法产生了更好的验证回归模型。ML算法在测试数据上表现出强大的预测性能,对于所使用的三组化学变量中的任何一组,其平均值都超过80%的准确率。虽然XGBoost提供了稍好的模型,但RF所需的低计算量是有利的。讨论了模型中包含的关键变量以及控制过拟合的重要性。总的来说,吸光度、伏安或EEM信号与RF或XGBoost算法相结合,是一种廉价、易于使用和快速的方法,可以从复杂矩阵(如葡萄酒)中的化学信号中预测感官特性。
期刊介绍:
Food Quality and Preference is a journal devoted to sensory, consumer and behavioural research in food and non-food products. It publishes original research, critical reviews, and short communications in sensory and consumer science, and sensometrics. In addition, the journal publishes special invited issues on important timely topics and from relevant conferences. These are aimed at bridging the gap between research and application, bringing together authors and readers in consumer and market research, sensory science, sensometrics and sensory evaluation, nutrition and food choice, as well as food research, product development and sensory quality assurance. Submissions to Food Quality and Preference are limited to papers that include some form of human measurement; papers that are limited to physical/chemical measures or the routine application of sensory, consumer or econometric analysis will not be considered unless they specifically make a novel scientific contribution in line with the journal''s coverage as outlined below.