机器学习和自然语言处理模型来预测食品加工的程度

IF 4 2区农林科学 Q2 CHEMISTRY, APPLIED

Journal of Food Composition and Analysis Pub Date : 2025-06-18 DOI:10.1016/j.jfca.2025.107938

Nalin Arora , Sumit Bhagat , Riya Dhama , Ganesh Bagler

{"title":"机器学习和自然语言处理模型来预测食品加工的程度","authors":"Nalin Arora , Sumit Bhagat , Riya Dhama , Ganesh Bagler","doi":"10.1016/j.jfca.2025.107938","DOIUrl":null,"url":null,"abstract":"<div><div>The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of 0.8691 and 0.8543. For the 13-nutrient panel, Gradient Boost achieved the best F1-score of 0.9284 and MCC of 0.8425. We also implemented NLP based models, which exhibited state-of-the-art performance. Besides distilling nutrients critical for model performance, we present a user-friendly web server for predicting processing level based on the nutrient panel of a food product: <span><span>https://cosylab.iiitd.edu.in/food-processing/</span><svg><path></path></svg></span>.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":15867,"journal":{"name":"Journal of Food Composition and Analysis","volume":"146 ","pages":"Article 107938"},"PeriodicalIF":4.0000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning and natural language processing models to predict the extent of food processing\",\"authors\":\"Nalin Arora , Sumit Bhagat , Riya Dhama , Ganesh Bagler\",\"doi\":\"10.1016/j.jfca.2025.107938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of 0.8691 and 0.8543. For the 13-nutrient panel, Gradient Boost achieved the best F1-score of 0.9284 and MCC of 0.8425. We also implemented NLP based models, which exhibited state-of-the-art performance. Besides distilling nutrients critical for model performance, we present a user-friendly web server for predicting processing level based on the nutrient panel of a food product: <span><span>https://cosylab.iiitd.edu.in/food-processing/</span><svg><path></path></svg></span>.<span><span><sup>1</sup></span></span></div></div>\",\"PeriodicalId\":15867,\"journal\":{\"name\":\"Journal of Food Composition and Analysis\",\"volume\":\"146 \",\"pages\":\"Article 107938\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Food Composition and Analysis\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0889157525007537\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Composition and Analysis","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0889157525007537","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}

引用次数: 0

摘要

超加工食品消费量的急剧增加与许多不利的健康影响有关。考虑到与超加工食品消费有关的公共卫生后果，建立计算模型来预测食品加工是高度相关的。我们创建了一系列机器学习、深度学习和NLP模型，通过将食品的FNDDS数据集及其营养特征与报告的NOVA加工水平相结合，来预测食品加工的程度。从102个特征的全营养面板开始，我们通过去掉类黄酮进一步实现特征粗粒度化到65个和13个营养素，然后分别考虑FDA的13个营养素面板。LGBM分类器和随机森林模型分别对102种和65种营养物进行了优选，其f1得分分别为0.9411和0.9345，MCC分别为0.8691和0.8543。在13种营养成分面板中，Gradient Boost的f1得分为0.9284，MCC为0.8425。我们还实现了基于NLP的模型，展示了最先进的性能。除了提取对模型性能至关重要的营养物质外，我们还提供了一个用户友好的web服务器，用于根据食品的营养面板预测加工水平：https://cosylab.iiitd.edu.in/food-processing/.1

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine learning and natural language processing models to predict the extent of food processing

The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of 0.8691 and 0.8543. For the 13-nutrient panel, Gradient Boost achieved the best F1-score of 0.9284 and MCC of 0.8425. We also implemented NLP based models, which exhibited state-of-the-art performance. Besides distilling nutrients critical for model performance, we present a user-friendly web server for predicting processing level based on the nutrient panel of a food product: https://cosylab.iiitd.edu.in/food-processing/.¹

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Food Composition and Analysis 工程技术-食品科技

CiteScore

6.20

自引率

11.60%

发文量

601

审稿时长

53 days

期刊介绍： The Journal of Food Composition and Analysis publishes manuscripts on scientific aspects of data on the chemical composition of human foods, with particular emphasis on actual data on composition of foods; analytical methods; studies on the manipulation, storage, distribution and use of food composition data; and studies on the statistics, use and distribution of such data and data systems. The Journal''s basis is nutrient composition, with increasing emphasis on bioactive non-nutrient and anti-nutrient components. Papers must provide sufficient description of the food samples, analytical methods, quality control procedures and statistical treatments of the data to permit the end users of the food composition data to evaluate the appropriateness of such data in their projects. The Journal does not publish papers on: microbiological compounds; sensory quality; aromatics/volatiles in food and wine; essential oils; organoleptic characteristics of food; physical properties; or clinical papers and pharmacology-related papers.