Explainable AI-driven evaluation of plant protein rheology using tree-based and Gaussian process machine learning models

IF 6 2区 工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY
Mustafa Tahsin Yilmaz , Salman Badurayq , Kemal Polat , Ahmad H. Milyani , Abdulaziz S. Alkabaa , Osman Gul , Furkan Turker Saricaoglu
{"title":"Explainable AI-driven evaluation of plant protein rheology using tree-based and Gaussian process machine learning models","authors":"Mustafa Tahsin Yilmaz ,&nbsp;Salman Badurayq ,&nbsp;Kemal Polat ,&nbsp;Ahmad H. Milyani ,&nbsp;Abdulaziz S. Alkabaa ,&nbsp;Osman Gul ,&nbsp;Furkan Turker Saricaoglu","doi":"10.1016/j.asej.2025.103565","DOIUrl":null,"url":null,"abstract":"<div><div>In this study, we conducted a comparative analysis of the explainability of Decision Tree Regressor (DTR) and Gaussian Process Regressor (GPR) models in predicting the shear stress and viscosity of sesame protein isolate (SPI) systems, employing explainable machine learning (EML) techniques to elucidate complex, nonlinear relationships among processing parameters. SPI samples were processed across pressure levels ranging from 0 to 100 MPa and ion concentration (IC) values from 0 to 200 mM. DTR model accurately predicted shear stress (<em>R</em><sup>2</sup> = 0.999), while a GPR model achieved high performance for viscosity prediction (<em>R</em><sup>2</sup> = 0.9925). Formally, the modeling task is framed as learning a predicting mapping function <span><math><mrow><mi>f</mi><mo>:</mo><msup><mrow><mi>R</mi></mrow><mi>p</mi></msup><mo>→</mo><mi>R</mi></mrow></math></span>, where <span><math><mrow><mi>x</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mi>p</mi></msup></mrow></math></span> denotes the vector of predictors (pressure, IC, shear rate) and <span><math><mrow><mi>y</mi><mo>∈</mo><mi>R</mi></mrow></math></span> is the target variable (shear stress or viscosity), by minimizing a loss function such as mean squared error. Interpretation of model predictions using SHapley Additive exPlanations (SHAP), permutation importance, and partial dependence analysis revealed that pressure and IC are the most influential factors affecting shear stress and viscosity, with pressure inducing protein conformational changes that impact rheological properties. The shear rate exhibited a lesser direct impact within the systems examined. Partial Dependence Plots (PDPs) from the DTR model revealed strong, nearly linear positive relationships between pressure and shear stress, while the GPR model depicted more nuanced responses, highlighting the models’ differing sensitivities. Variance-Based Sensitivity Indices (VBSIs) further quantified these influences, with pressure and IC showing higher sensitivity scores in the DTR model compared to the GPR model. Permutation importance and SHAP interaction analyses corroborated these results, emphasizing the dominant role of pressure and IC, both independently and interactively, in determining shear stress. In contrast, viscosity predictions were influenced by more distributed and subtle interactions among all features. Employing explainable machine learning techniques enables a comprehensive understanding of feature relevance in complex, nonlinear rheological systems, facilitating the elucidation of viscosity development in sesame protein systems through rheological indices. This approach ensures no bias toward formulation composition and applied pressure, offering valuable insights for optimizing formulation and processing conditions in food applications to enhance the functional properties of SPI-based products.</div></div>","PeriodicalId":48648,"journal":{"name":"Ain Shams Engineering Journal","volume":"16 9","pages":"Article 103565"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ain Shams Engineering Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2090447925003065","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

In this study, we conducted a comparative analysis of the explainability of Decision Tree Regressor (DTR) and Gaussian Process Regressor (GPR) models in predicting the shear stress and viscosity of sesame protein isolate (SPI) systems, employing explainable machine learning (EML) techniques to elucidate complex, nonlinear relationships among processing parameters. SPI samples were processed across pressure levels ranging from 0 to 100 MPa and ion concentration (IC) values from 0 to 200 mM. DTR model accurately predicted shear stress (R2 = 0.999), while a GPR model achieved high performance for viscosity prediction (R2 = 0.9925). Formally, the modeling task is framed as learning a predicting mapping function f:RpR, where xRp denotes the vector of predictors (pressure, IC, shear rate) and yR is the target variable (shear stress or viscosity), by minimizing a loss function such as mean squared error. Interpretation of model predictions using SHapley Additive exPlanations (SHAP), permutation importance, and partial dependence analysis revealed that pressure and IC are the most influential factors affecting shear stress and viscosity, with pressure inducing protein conformational changes that impact rheological properties. The shear rate exhibited a lesser direct impact within the systems examined. Partial Dependence Plots (PDPs) from the DTR model revealed strong, nearly linear positive relationships between pressure and shear stress, while the GPR model depicted more nuanced responses, highlighting the models’ differing sensitivities. Variance-Based Sensitivity Indices (VBSIs) further quantified these influences, with pressure and IC showing higher sensitivity scores in the DTR model compared to the GPR model. Permutation importance and SHAP interaction analyses corroborated these results, emphasizing the dominant role of pressure and IC, both independently and interactively, in determining shear stress. In contrast, viscosity predictions were influenced by more distributed and subtle interactions among all features. Employing explainable machine learning techniques enables a comprehensive understanding of feature relevance in complex, nonlinear rheological systems, facilitating the elucidation of viscosity development in sesame protein systems through rheological indices. This approach ensures no bias toward formulation composition and applied pressure, offering valuable insights for optimizing formulation and processing conditions in food applications to enhance the functional properties of SPI-based products.
利用基于树和高斯过程的机器学习模型对植物蛋白流变学进行可解释的人工智能驱动评估
在这项研究中,我们对决策树回归(DTR)和高斯过程回归(GPR)模型在预测芝麻分离蛋白(SPI)系统剪切应力和粘度方面的可解释性进行了比较分析,采用可解释机器学习(EML)技术来阐明处理参数之间复杂的非线性关系。SPI样品在0 ~ 100 MPa的压力水平和0 ~ 200 mM的离子浓度(IC)范围内进行处理。DTR模型准确预测剪切应力(R2 = 0.999),而GPR模型在粘度预测方面具有较高的性能(R2 = 0.9925)。形式上,建模任务被框架为学习预测映射函数f:Rp→R,其中x∈Rp表示预测因子(压力,IC,剪切速率)的向量,y∈R是目标变量(剪切应力或粘度),通过最小化均方误差等损失函数。利用SHapley加性解释(SHAP)、排列重要性和部分依赖分析对模型预测进行解释,发现压力和IC是影响剪切应力和粘度的最重要因素,压力诱导蛋白质构象变化影响流变特性。剪切速率对系统的直接影响较小。DTR模型的部分相关图(pdp)显示压力和剪应力之间存在强烈的、接近线性的正相关关系,而GPR模型描绘了更细微的响应,突出了模型的不同敏感性。基于方差的敏感性指数(vbsi)进一步量化了这些影响,与GPR模型相比,压力和IC在DTR模型中显示出更高的敏感性评分。排列重要性和SHAP相互作用分析证实了这些结果,强调了压力和IC在决定剪切应力方面的主导作用,无论是独立的还是相互作用的。相比之下,粘度预测受到所有特征之间更分散和微妙的相互作用的影响。采用可解释的机器学习技术可以全面理解复杂非线性流变系统中的特征相关性,通过流变指标促进芝麻蛋白系统粘度发展的阐明。这种方法确保了配方成分和应用压力不存在偏差,为优化食品应用中的配方和加工条件以增强基于spi的产品的功能特性提供了有价值的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ain Shams Engineering Journal
Ain Shams Engineering Journal Engineering-General Engineering
CiteScore
10.80
自引率
13.30%
发文量
441
审稿时长
49 weeks
期刊介绍: in Shams Engineering Journal is an international journal devoted to publication of peer reviewed original high-quality research papers and review papers in both traditional topics and those of emerging science and technology. Areas of both theoretical and fundamental interest as well as those concerning industrial applications, emerging instrumental techniques and those which have some practical application to an aspect of human endeavor, such as the preservation of the environment, health, waste disposal are welcome. The overall focus is on original and rigorous scientific research results which have generic significance. Ain Shams Engineering Journal focuses upon aspects of mechanical engineering, electrical engineering, civil engineering, chemical engineering, petroleum engineering, environmental engineering, architectural and urban planning engineering. Papers in which knowledge from other disciplines is integrated with engineering are especially welcome like nanotechnology, material sciences, and computational methods as well as applied basic sciences: engineering mathematics, physics and chemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信