机器学习和自然语言处理模型来预测食品加工的程度

IF 4 2区 农林科学 Q2 CHEMISTRY, APPLIED
Nalin Arora , Sumit Bhagat , Riya Dhama , Ganesh Bagler
{"title":"机器学习和自然语言处理模型来预测食品加工的程度","authors":"Nalin Arora ,&nbsp;Sumit Bhagat ,&nbsp;Riya Dhama ,&nbsp;Ganesh Bagler","doi":"10.1016/j.jfca.2025.107938","DOIUrl":null,"url":null,"abstract":"<div><div>The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of 0.8691 and 0.8543. For the 13-nutrient panel, Gradient Boost achieved the best F1-score of 0.9284 and MCC of 0.8425. We also implemented NLP based models, which exhibited state-of-the-art performance. Besides distilling nutrients critical for model performance, we present a user-friendly web server for predicting processing level based on the nutrient panel of a food product: <span><span>https://cosylab.iiitd.edu.in/food-processing/</span><svg><path></path></svg></span>.<span><span><sup>1</sup></span></span></div></div>","PeriodicalId":15867,"journal":{"name":"Journal of Food Composition and Analysis","volume":"146 ","pages":"Article 107938"},"PeriodicalIF":4.0000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning and natural language processing models to predict the extent of food processing\",\"authors\":\"Nalin Arora ,&nbsp;Sumit Bhagat ,&nbsp;Riya Dhama ,&nbsp;Ganesh Bagler\",\"doi\":\"10.1016/j.jfca.2025.107938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of 0.8691 and 0.8543. For the 13-nutrient panel, Gradient Boost achieved the best F1-score of 0.9284 and MCC of 0.8425. We also implemented NLP based models, which exhibited state-of-the-art performance. Besides distilling nutrients critical for model performance, we present a user-friendly web server for predicting processing level based on the nutrient panel of a food product: <span><span>https://cosylab.iiitd.edu.in/food-processing/</span><svg><path></path></svg></span>.<span><span><sup>1</sup></span></span></div></div>\",\"PeriodicalId\":15867,\"journal\":{\"name\":\"Journal of Food Composition and Analysis\",\"volume\":\"146 \",\"pages\":\"Article 107938\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Food Composition and Analysis\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0889157525007537\",\"RegionNum\":2,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Composition and Analysis","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0889157525007537","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, APPLIED","Score":null,"Total":0}
引用次数: 0

摘要

超加工食品消费量的急剧增加与许多不利的健康影响有关。考虑到与超加工食品消费有关的公共卫生后果,建立计算模型来预测食品加工是高度相关的。我们创建了一系列机器学习、深度学习和NLP模型,通过将食品的FNDDS数据集及其营养特征与报告的NOVA加工水平相结合,来预测食品加工的程度。从102个特征的全营养面板开始,我们通过去掉类黄酮进一步实现特征粗粒度化到65个和13个营养素,然后分别考虑FDA的13个营养素面板。LGBM分类器和随机森林模型分别对102种和65种营养物进行了优选,其f1得分分别为0.9411和0.9345,MCC分别为0.8691和0.8543。在13种营养成分面板中,Gradient Boost的f1得分为0.9284,MCC为0.8425。我们还实现了基于NLP的模型,展示了最先进的性能。除了提取对模型性能至关重要的营养物质外,我们还提供了一个用户友好的web服务器,用于根据食品的营养面板预测加工水平:https://cosylab.iiitd.edu.in/food-processing/.1
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine learning and natural language processing models to predict the extent of food processing
The dramatic increase in consumption of ultra-processed food has been associated with numerous adverse health effects. Given the public health consequences linked to ultra-processed food consumption, it is highly relevant to build computational models to predict the processing of food products. We created a range of machine learning, deep learning, and NLP models to predict the extent of food processing by integrating the FNDDS dataset of food products and their nutrient profiles with their reported NOVA processing level. Starting with the full nutritional panel of 102 features, we further implemented coarse-graining of features to 65 and 13 nutrients by dropping flavonoids and then by considering the 13-nutrient panel of FDA, respectively. LGBM Classifier and Random Forest emerged as the best model for 102 and 65 nutrients, respectively, with an F1-score of 0.9411 and 0.9345 and MCC of 0.8691 and 0.8543. For the 13-nutrient panel, Gradient Boost achieved the best F1-score of 0.9284 and MCC of 0.8425. We also implemented NLP based models, which exhibited state-of-the-art performance. Besides distilling nutrients critical for model performance, we present a user-friendly web server for predicting processing level based on the nutrient panel of a food product: https://cosylab.iiitd.edu.in/food-processing/.1
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Food Composition and Analysis
Journal of Food Composition and Analysis 工程技术-食品科技
CiteScore
6.20
自引率
11.60%
发文量
601
审稿时长
53 days
期刊介绍: The Journal of Food Composition and Analysis publishes manuscripts on scientific aspects of data on the chemical composition of human foods, with particular emphasis on actual data on composition of foods; analytical methods; studies on the manipulation, storage, distribution and use of food composition data; and studies on the statistics, use and distribution of such data and data systems. The Journal''s basis is nutrient composition, with increasing emphasis on bioactive non-nutrient and anti-nutrient components. Papers must provide sufficient description of the food samples, analytical methods, quality control procedures and statistical treatments of the data to permit the end users of the food composition data to evaluate the appropriateness of such data in their projects. The Journal does not publish papers on: microbiological compounds; sensory quality; aromatics/volatiles in food and wine; essential oils; organoleptic characteristics of food; physical properties; or clinical papers and pharmacology-related papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信