Causal Artificial Intelligence Models of Food Quality Data

IF 2.3 4区 农林科学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Ž. Kurtanjek
{"title":"Causal Artificial Intelligence Models of Food Quality Data","authors":"Ž. Kurtanjek","doi":"10.17113/ftb.62.01.24.8301","DOIUrl":null,"url":null,"abstract":"Research background. The motivation of this study is to emphasize the importance of artificial intelligence (AI) and causality modelling of food quality and analysis with “big data”. AI with structural causal modelling (SCM), based on Bayes networks and deep learning, enables the integration of theoretical field knowledge in food technology with process production, physical-chemical analytics, and consumer organoleptic assessments. Food products have complex nature and data are highly dimensional, with intricate interrelations (correlations) and are difficult to relate to consumer sensory perception of food quality. Standard regression modelling techniques such as multiple ordinary least squares (OLS) and partial least squares (PLS) are effectively applied for the prediction by linear interpolations of observed data under cross-sectional stationary conditions. Upgrading linear regression models by machine learning (ML) accounts for nonlinear relations and reveals functional patterns, but is prone to confounding and fails predictions under unobserved nonstationary conditions. Confounding of data variables is the main obstacle to applications of the regression models in food innovations under previously untrained conditions. Hence, this manuscript focuses on applying causal graphical models with Bayes networks to infer causal relationships and intervention effects between process variables and consumer sensory assessment of food quality. \nExperimental approach. This study is based on the literature available data on the process of wheat bread baking quality, consumer sensory quality assessments of fermented milk products, and professional wine tasting data. The data for wheat baking quality are regularized by the least absolute shrinkage and selection operator (LASSO elastic net). Applied is Bayes statistics for evaluation of the model joint probability function for inferring the network structure and parameters. The obtained SCM models are presented as directed acyclic graphs (DAG). D-separation criteria is applied to block confounding effects in estimating direct and total causal effects of process variables and consumer perception on food quality. Probability distributions of causal effects of the intervention of individual process variables on quality are presented as partial dependency plots determined by Bayes neural networks. In the case of wine quality causality, the total causal effects determined by SCM models are positively validated by the double machine learning (DML) algorithm.\nResults and conclusions. Analysed is the data set of 45 continuous variables corresponding to different chemical, physical and biochemical variables of wheat properties from seven Croatian cultivars during two years of controlled cultivation. LASSO regularization of the data set yielded the ten key predictors, accounting for 98 % variance of the baking quality data. Based on the key variables derived is the quality predictive random forest model with 75 % cross-validation accuracy. Causal analysis between the quality and key predictors is based on the Bayes model depicted as a DAG graph. Protein content shows the most important direct causal effect with the corresponding path coefficient of 0.71, and THMW (total high molecular glutenin subunits) content is an indirect cause with a path coefficient of 0.42, and protein total average causal effect (ACE) is 0.65. The large data set of quality fermented milk products includes binary consumer sensory data (taste, odour, turbidity), continuous physical variables (temperature, fat, pH, colour), and three grade classes of consumer quality assessment. Derived is a random forest model for the prediction of the quality classification with an “out of box” (OOB) error of 0.28 %. The Bayes network model predicts that the direct causes of the taste classification are temperature, colour, and fat content, while the direct causes for the quality classification are temperature, turbidity, odour, and fat content. Estimated are the key quality grade average causal effects (ACE) of temperature -0.04 grade/°C and 0.3 quality grade/fat content. The temperature ACE dependency shows a nonlinear type as negative saturation with the “breaking” point at 60 °C, while for fat ACE has a positive linear trend. Causal quality analysis of red and white wine is based on the large data set of eleven continuous variables of physical and chemical properties and quality assessments classified in ten classes, from 1 to 10. Each classification is obtained in triplicates by a panel of professional wine tasters. A non-structural double machine learning algorithm (DML) is applied for total ACE quality assessment. The alcohol content of red and white wine has the key positive ACE relative factor of 0.35 quality/alcohol, while volatile acidity has the key negative ACE –0.2 quality/acidity. The obtained ACE predictions by the unstructured DML algorithm are in close agreement with the ACE obtained by the structural SCM models. \nNovelty and scientific contribution. Presented are novel methodologies and results for the application of causal artificial intelligence models in the analysis of consumer assessment of the quality of food products. The application of Bayes network structural causal models (SCM) enables the d-separation of pronounced effects of confounding between parameters in noncausal regression models. Based on SCM, inference of average causal effects (ACE) provides substantiated and validated research hypotheses for new products and support for decisions of potential interventions for improvement in product design, new process introduction, process control, management, and marketing.","PeriodicalId":12400,"journal":{"name":"Food Technology and Biotechnology","volume":"17 12","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Food Technology and Biotechnology","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.17113/ftb.62.01.24.8301","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Research background. The motivation of this study is to emphasize the importance of artificial intelligence (AI) and causality modelling of food quality and analysis with “big data”. AI with structural causal modelling (SCM), based on Bayes networks and deep learning, enables the integration of theoretical field knowledge in food technology with process production, physical-chemical analytics, and consumer organoleptic assessments. Food products have complex nature and data are highly dimensional, with intricate interrelations (correlations) and are difficult to relate to consumer sensory perception of food quality. Standard regression modelling techniques such as multiple ordinary least squares (OLS) and partial least squares (PLS) are effectively applied for the prediction by linear interpolations of observed data under cross-sectional stationary conditions. Upgrading linear regression models by machine learning (ML) accounts for nonlinear relations and reveals functional patterns, but is prone to confounding and fails predictions under unobserved nonstationary conditions. Confounding of data variables is the main obstacle to applications of the regression models in food innovations under previously untrained conditions. Hence, this manuscript focuses on applying causal graphical models with Bayes networks to infer causal relationships and intervention effects between process variables and consumer sensory assessment of food quality. Experimental approach. This study is based on the literature available data on the process of wheat bread baking quality, consumer sensory quality assessments of fermented milk products, and professional wine tasting data. The data for wheat baking quality are regularized by the least absolute shrinkage and selection operator (LASSO elastic net). Applied is Bayes statistics for evaluation of the model joint probability function for inferring the network structure and parameters. The obtained SCM models are presented as directed acyclic graphs (DAG). D-separation criteria is applied to block confounding effects in estimating direct and total causal effects of process variables and consumer perception on food quality. Probability distributions of causal effects of the intervention of individual process variables on quality are presented as partial dependency plots determined by Bayes neural networks. In the case of wine quality causality, the total causal effects determined by SCM models are positively validated by the double machine learning (DML) algorithm. Results and conclusions. Analysed is the data set of 45 continuous variables corresponding to different chemical, physical and biochemical variables of wheat properties from seven Croatian cultivars during two years of controlled cultivation. LASSO regularization of the data set yielded the ten key predictors, accounting for 98 % variance of the baking quality data. Based on the key variables derived is the quality predictive random forest model with 75 % cross-validation accuracy. Causal analysis between the quality and key predictors is based on the Bayes model depicted as a DAG graph. Protein content shows the most important direct causal effect with the corresponding path coefficient of 0.71, and THMW (total high molecular glutenin subunits) content is an indirect cause with a path coefficient of 0.42, and protein total average causal effect (ACE) is 0.65. The large data set of quality fermented milk products includes binary consumer sensory data (taste, odour, turbidity), continuous physical variables (temperature, fat, pH, colour), and three grade classes of consumer quality assessment. Derived is a random forest model for the prediction of the quality classification with an “out of box” (OOB) error of 0.28 %. The Bayes network model predicts that the direct causes of the taste classification are temperature, colour, and fat content, while the direct causes for the quality classification are temperature, turbidity, odour, and fat content. Estimated are the key quality grade average causal effects (ACE) of temperature -0.04 grade/°C and 0.3 quality grade/fat content. The temperature ACE dependency shows a nonlinear type as negative saturation with the “breaking” point at 60 °C, while for fat ACE has a positive linear trend. Causal quality analysis of red and white wine is based on the large data set of eleven continuous variables of physical and chemical properties and quality assessments classified in ten classes, from 1 to 10. Each classification is obtained in triplicates by a panel of professional wine tasters. A non-structural double machine learning algorithm (DML) is applied for total ACE quality assessment. The alcohol content of red and white wine has the key positive ACE relative factor of 0.35 quality/alcohol, while volatile acidity has the key negative ACE –0.2 quality/acidity. The obtained ACE predictions by the unstructured DML algorithm are in close agreement with the ACE obtained by the structural SCM models. Novelty and scientific contribution. Presented are novel methodologies and results for the application of causal artificial intelligence models in the analysis of consumer assessment of the quality of food products. The application of Bayes network structural causal models (SCM) enables the d-separation of pronounced effects of confounding between parameters in noncausal regression models. Based on SCM, inference of average causal effects (ACE) provides substantiated and validated research hypotheses for new products and support for decisions of potential interventions for improvement in product design, new process introduction, process control, management, and marketing.
食品质量数据的因果人工智能模型
研究背景。本研究的动机是强调人工智能(AI)和食品质量因果关系建模以及 "大数据 "分析的重要性。基于贝叶斯网络和深度学习的人工智能与结构因果建模(SCM)能够将食品技术领域的理论知识与工艺生产、物理化学分析和消费者感官评估相结合。食品性质复杂,数据维度高,相互关系(相关性)错综复杂,很难与消费者对食品质量的感官认知联系起来。标准回归建模技术,如多重普通最小二乘法(OLS)和偏最小二乘法(PLS),在横截面静态条件下通过对观测数据的线性插值进行预测是非常有效的。通过机器学习(ML)升级线性回归模型可考虑非线性关系并揭示功能模式,但在未观察到的非稳态条件下,易受混杂因素影响且预测失败。数据变量的混杂是回归模型在以前未经训练的条件下应用于食品创新的主要障碍。因此,本手稿侧重于应用贝叶斯网络的因果图模型来推断过程变量与消费者对食品质量的感官评估之间的因果关系和干预效果。实验方法。本研究基于小麦面包烘焙质量过程的文献数据、消费者对发酵乳制品的感官质量评估以及专业品酒数据。小麦烘焙质量数据采用最小绝对收缩和选择算子(LASSO 弹性网)进行正则化处理。应用贝叶斯统计评估模型联合概率函数,以推断网络结构和参数。获得的单片机模型以有向无环图(DAG)的形式呈现。在估算过程变量和消费者感知对食品质量的直接和总体因果效应时,采用 D 分离标准来阻断混杂效应。单个过程变量的干预对质量的因果效应的概率分布以贝叶斯神经网络确定的部分依赖图的形式呈现。在葡萄酒质量因果关系的案例中,单片机模型确定的总因果效应通过双机器学习(DML)算法得到了积极验证。所分析的数据集包含 45 个连续变量,分别对应克罗地亚 7 个栽培品种在两年控制栽培期间小麦特性的不同化学、物理和生化变量。通过对数据集进行 LASSO 正则化,得出了 10 个关键预测因子,占烘焙质量数据方差的 98%。基于关键变量得出的质量预测随机森林模型的交叉验证准确率为 75%。质量和关键预测因子之间的因果关系分析是基于贝叶斯模型,以 DAG 图的形式描述。蛋白质含量显示了最重要的直接因果效应,相应的路径系数为 0.71,THMW(高分子谷蛋白亚基总量)含量是间接原因,路径系数为 0.42,蛋白质总平均因果效应(ACE)为 0.65。优质发酵乳产品的大数据集包括二进制消费者感官数据(味道、气味、浑浊度)、连续物理变量(温度、脂肪、pH 值、颜色)和消费者质量评估的三个等级。得出的质量分类预测随机森林模型的 "箱外"(OOB)误差为 0.28%。贝叶斯网络模型预测,口味分级的直接原因是温度、颜色和脂肪含量,而质量分级的直接原因是温度、浑浊度、气味和脂肪含量。估计的主要质量等级平均因果效应(ACE)为温度-0.04 级/°C 和质量等级 0.3 级/脂肪含量。温度 ACE 依赖性显示为非线性负饱和,"断裂 "点在 60 °C,而脂肪 ACE 呈正线性趋势。红葡萄酒和白葡萄酒的因果质量分析是基于由 11 个物理和化学性质连续变量组成的大型数据集,质量评估分为 10 个等级,从 1 到 10。每个等级都由一个专业品酒师小组进行三重分类。非结构性双机器学习算法(DML)被应用于整个 ACE 质量评估。红葡萄酒和白葡萄酒的酒精含量具有关键的正 ACE 相对因子 0.35 质量/酒精,而挥发性酸度具有关键的负 ACE -0.2 质量/酸度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Food Technology and Biotechnology
Food Technology and Biotechnology 工程技术-生物工程与应用微生物
CiteScore
3.70
自引率
0.00%
发文量
33
审稿时长
12 months
期刊介绍: Food Technology and Biotechnology (FTB) is a diamond open access, peer-reviewed international quarterly scientific journal that publishes papers covering a wide range of topics, including molecular biology, genetic engineering, biochemistry, microbiology, biochemical engineering and biotechnological processing, food science, analysis of food ingredients and final products, food processing and technology, oenology and waste treatment. The Journal is published by the University of Zagreb, Faculty of Food Technology and Biotechnology, Croatia. It is an official journal of Croatian Society of Biotechnology and Slovenian Microbiological Society, financed by the Croatian Ministry of Science and Education, and supported by the Croatian Academy of Sciences and Arts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信