基于知识的数据驱动建模用于稀疏识别食品中微生物灭活过程的控制方程

Frontiers in food science and technology Pub Date : 2022-10-07 DOI:10.3389/frfst.2022.996399

Steve Zhang, Firnaaz Ahamed, Hyun‐Seob Song

{"title":"基于知识的数据驱动建模用于稀疏识别食品中微生物灭活过程的控制方程","authors":"Steve Zhang, Firnaaz Ahamed, Hyun‐Seob Song","doi":"10.3389/frfst.2022.996399","DOIUrl":null,"url":null,"abstract":"Prevention of the growth of harmful microorganisms in food products is an important requirement for ensuring food safety and quality. Mathematical models to predict the quantitative changes in microbial populations in food to the variations of environmental conditions are useful tools in this regard. While equations for microbial inactivation have typically been formulated based on polynomial functions, empirical choice of the model order and terms not only results in over- or underfitting, but also makes it difficult to identify key factors governing the target variable. To address this issue, we present a data-driven modeling pipeline that enables 1) automatic discovery of model equations through parsimonious selection of relevant terms from a pre-built library and 2) subsequent evaluation of the impacts of individual terms on the model output. Through case studies using literature data, we evaluated the effectiveness of our pipeline in predicting the D-value (i.e., the time taken to reduce microbial population to 10% of the initial level) as a function of multiple factors including temperature, pH, water activity, NaCl content, and phosphate level. In doing this, we determined basic functional forms of input and output variables based on their pre-known relationships, e.g., by accounting for the Arrhenius dependence of D-value on temperature. Incorporation of such theoretical knowledge into the pipeline improved model accuracy. Using the Akaike information criterion, we optimally determined hyperparameters that control a trade-off between model accuracy and sparsity. We found the literature models benchmarked in this study to be over- or under-determined and consequently proposed better structured and more accurate equations. The subsequent global sensitivity analysis allowed us to evaluate the context-dependent impacts of key factors on the D-value. The pipeline presented in this work is readily applicable to many other related non-linear systems without being limited to microbial inactivation datasets.","PeriodicalId":93753,"journal":{"name":"Frontiers in food science and technology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Knowledge-informed data-driven modeling for sparse identification of governing equations for microbial inactivation processes in food\",\"authors\":\"Steve Zhang, Firnaaz Ahamed, Hyun‐Seob Song\",\"doi\":\"10.3389/frfst.2022.996399\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Prevention of the growth of harmful microorganisms in food products is an important requirement for ensuring food safety and quality. Mathematical models to predict the quantitative changes in microbial populations in food to the variations of environmental conditions are useful tools in this regard. While equations for microbial inactivation have typically been formulated based on polynomial functions, empirical choice of the model order and terms not only results in over- or underfitting, but also makes it difficult to identify key factors governing the target variable. To address this issue, we present a data-driven modeling pipeline that enables 1) automatic discovery of model equations through parsimonious selection of relevant terms from a pre-built library and 2) subsequent evaluation of the impacts of individual terms on the model output. Through case studies using literature data, we evaluated the effectiveness of our pipeline in predicting the D-value (i.e., the time taken to reduce microbial population to 10% of the initial level) as a function of multiple factors including temperature, pH, water activity, NaCl content, and phosphate level. In doing this, we determined basic functional forms of input and output variables based on their pre-known relationships, e.g., by accounting for the Arrhenius dependence of D-value on temperature. Incorporation of such theoretical knowledge into the pipeline improved model accuracy. Using the Akaike information criterion, we optimally determined hyperparameters that control a trade-off between model accuracy and sparsity. We found the literature models benchmarked in this study to be over- or under-determined and consequently proposed better structured and more accurate equations. The subsequent global sensitivity analysis allowed us to evaluate the context-dependent impacts of key factors on the D-value. The pipeline presented in this work is readily applicable to many other related non-linear systems without being limited to microbial inactivation datasets.\",\"PeriodicalId\":93753,\"journal\":{\"name\":\"Frontiers in food science and technology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in food science and technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/frfst.2022.996399\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in food science and technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frfst.2022.996399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

防止食品中有害微生物的生长是确保食品安全和质量的重要要求。在这方面，预测食物中微生物种群随环境条件变化的定量变化的数学模型是有用的工具。虽然微生物灭活的方程通常是基于多项式函数制定的，但模型顺序和项的经验选择不仅会导致拟合过度或拟合不足，而且很难确定控制目标变量的关键因素。为了解决这个问题，我们提出了一种数据驱动的建模管道，它能够实现1）通过从预先构建的库中简约地选择相关项来自动发现模型方程，以及2）随后评估单个项对模型输出的影响。通过使用文献数据的案例研究，我们评估了我们的管道在预测D值（即将微生物种群减少到初始水平的10%所需的时间）方面的有效性，该D值是温度、pH、水活性、NaCl含量和磷酸盐水平等多个因素的函数。在这样做的过程中，我们根据输入和输出变量的已知关系确定了它们的基本函数形式，例如，通过考虑D值对温度的阿伦尼斯依赖性。将这样的理论知识结合到管道中提高了模型的准确性。使用Akaike信息准则，我们最优地确定了控制模型精度和稀疏性之间权衡的超参数。我们发现，本研究中以文献模型为基准的模型被高估或低估，因此提出了结构更好、更准确的方程。随后的全球敏感性分析使我们能够评估关键因素对D值的上下文相关影响。这项工作中提出的管道很容易适用于许多其他相关的非线性系统，而不限于微生物灭活数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Knowledge-informed data-driven modeling for sparse identification of governing equations for microbial inactivation processes in food

Prevention of the growth of harmful microorganisms in food products is an important requirement for ensuring food safety and quality. Mathematical models to predict the quantitative changes in microbial populations in food to the variations of environmental conditions are useful tools in this regard. While equations for microbial inactivation have typically been formulated based on polynomial functions, empirical choice of the model order and terms not only results in over- or underfitting, but also makes it difficult to identify key factors governing the target variable. To address this issue, we present a data-driven modeling pipeline that enables 1) automatic discovery of model equations through parsimonious selection of relevant terms from a pre-built library and 2) subsequent evaluation of the impacts of individual terms on the model output. Through case studies using literature data, we evaluated the effectiveness of our pipeline in predicting the D-value (i.e., the time taken to reduce microbial population to 10% of the initial level) as a function of multiple factors including temperature, pH, water activity, NaCl content, and phosphate level. In doing this, we determined basic functional forms of input and output variables based on their pre-known relationships, e.g., by accounting for the Arrhenius dependence of D-value on temperature. Incorporation of such theoretical knowledge into the pipeline improved model accuracy. Using the Akaike information criterion, we optimally determined hyperparameters that control a trade-off between model accuracy and sparsity. We found the literature models benchmarked in this study to be over- or under-determined and consequently proposed better structured and more accurate equations. The subsequent global sensitivity analysis allowed us to evaluate the context-dependent impacts of key factors on the D-value. The pipeline presented in this work is readily applicable to many other related non-linear systems without being limited to microbial inactivation datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in food science and technology

自引率

0.00%

发文量