定制的多模式diabot - gpt - 40提高了台湾饮食培训生基于图像的饮食评估的准确性：对称重食物记录的验证。

IF 6.9 1区医学 Q1 NUTRITION & DIETETICS

American Journal of Clinical Nutrition Pub Date : 2025-10-23 DOI:10.1016/j.ajcnut.2025.10.013

Yu Jie Chen, Chun-Chao Chang, Yen Nhi Hoang, Annie W Lin, Wen-Ling Lin, Cheng-Yu Lin, Ellyn Patricia, Janice Clarisa Tissadharma, Jovan Kuanca, Natasya Nobelta, Kimberly Alecia Theo, Dang Khanh Ngan Ho, Pin-Hui Wei, Jung-Su Chang

{"title":"定制的多模式diabot - gpt - 40提高了台湾饮食培训生基于图像的饮食评估的准确性：对称重食物记录的验证。","authors":"Yu Jie Chen, Chun-Chao Chang, Yen Nhi Hoang, Annie W Lin, Wen-Ling Lin, Cheng-Yu Lin, Ellyn Patricia, Janice Clarisa Tissadharma, Jovan Kuanca, Natasya Nobelta, Kimberly Alecia Theo, Dang Khanh Ngan Ho, Pin-Hui Wei, Jung-Su Chang","doi":"10.1016/j.ajcnut.2025.10.013","DOIUrl":null,"url":null,"abstract":"Background: Automated image-based dietary assessments (IBDAs) using multimodal artificial intelligence (AI) chatbots show strong potential. However, sources of error at the human-AI interface in real-world use remain unclear.Objective: In this study, we validated a GPT-4o-powered chatbot for automated IBDAs and identified key sources of error in free-living settings.Methods: In total, 714 food images were collected from 3-day weighed food records (WFRs) across 171 days from 57 young adults. Images were analyzed using four AI configurations: Diabot (DB), DBFN (customized GPT-4o), 4o, and 4oFN (non-customized), where \"FN\" indicates inclusion of the food name input. Portion sizes and nutrient estimates were compared to WFRs using Bland-Altman plots with equivalence testing at ±10%, ±15%, and ±20% bounds.Results: Using images alone, DB recognized 74% of food items versus 59% for 4o. All AI configurations provided accurate estimates of portion sizes (±10%-15%, coefficient of variation (CV): 13%), energy (±10%-20%, CV: 14%), and carbohydrates (±15%-20%, CV: 15%), but showed less consistency for fats (±10%-22%, CV: 24%) and proteins (±10%->20.2%, CV: 18%). The custom DBFN outperformed 4oFN, achieving higher accuracy across more nutrients within the ±10% (weight, energy, fats, saturated fats, potassium, and magnesium), ±15% (proteins and sodium), and ±20% (carbohydrates and calcium) bounds and achieved the highest agreement with WFRs (Spearman ρ = 0.863-0.662; Lin's concordance correlation coefficient = 0.874-0.540). Common errors at the human-AI interface included inaccurate portion size estimates, obscured food visibility in images, poorly constructed prompts, omission or intrusion errors, and system-specific limitations such as processing overload and configuration inconsistencies.Conclusions: Customized AI chatbots improved automated IBDAs, yet accuracy depends on clear images for food visibility and portion-size fidelity. Standardized AI-input procedures (FN, cooking state, prompt structure, and configuration) and expert oversight to detect and correct AI hallucinations (fabricated items, units, or quantities) remain essential for reliable, interpretable estimates.","PeriodicalId":50813,"journal":{"name":"American Journal of Clinical Nutrition","volume":" ","pages":""},"PeriodicalIF":6.9000,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Customized Multimodal Diabot-GPT-4o Enhances Accuracy of Image-Based Dietary Assessments in Dietetic Trainees in Taiwan: Validation Against Weighed Food Records.\",\"authors\":\"Yu Jie Chen, Chun-Chao Chang, Yen Nhi Hoang, Annie W Lin, Wen-Ling Lin, Cheng-Yu Lin, Ellyn Patricia, Janice Clarisa Tissadharma, Jovan Kuanca, Natasya Nobelta, Kimberly Alecia Theo, Dang Khanh Ngan Ho, Pin-Hui Wei, Jung-Su Chang\",\"doi\":\"10.1016/j.ajcnut.2025.10.013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Automated image-based dietary assessments (IBDAs) using multimodal artificial intelligence (AI) chatbots show strong potential. However, sources of error at the human-AI interface in real-world use remain unclear.Objective: In this study, we validated a GPT-4o-powered chatbot for automated IBDAs and identified key sources of error in free-living settings.Methods: In total, 714 food images were collected from 3-day weighed food records (WFRs) across 171 days from 57 young adults. Images were analyzed using four AI configurations: Diabot (DB), DBFN (customized GPT-4o), 4o, and 4oFN (non-customized), where \\\"FN\\\" indicates inclusion of the food name input. Portion sizes and nutrient estimates were compared to WFRs using Bland-Altman plots with equivalence testing at ±10%, ±15%, and ±20% bounds.Results: Using images alone, DB recognized 74% of food items versus 59% for 4o. All AI configurations provided accurate estimates of portion sizes (±10%-15%, coefficient of variation (CV): 13%), energy (±10%-20%, CV: 14%), and carbohydrates (±15%-20%, CV: 15%), but showed less consistency for fats (±10%-22%, CV: 24%) and proteins (±10%->20.2%, CV: 18%). The custom DBFN outperformed 4oFN, achieving higher accuracy across more nutrients within the ±10% (weight, energy, fats, saturated fats, potassium, and magnesium), ±15% (proteins and sodium), and ±20% (carbohydrates and calcium) bounds and achieved the highest agreement with WFRs (Spearman ρ = 0.863-0.662; Lin's concordance correlation coefficient = 0.874-0.540). Common errors at the human-AI interface included inaccurate portion size estimates, obscured food visibility in images, poorly constructed prompts, omission or intrusion errors, and system-specific limitations such as processing overload and configuration inconsistencies.Conclusions: Customized AI chatbots improved automated IBDAs, yet accuracy depends on clear images for food visibility and portion-size fidelity. Standardized AI-input procedures (FN, cooking state, prompt structure, and configuration) and expert oversight to detect and correct AI hallucinations (fabricated items, units, or quantities) remain essential for reliable, interpretable estimates.\",\"PeriodicalId\":50813,\"journal\":{\"name\":\"American Journal of Clinical Nutrition\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Clinical Nutrition\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ajcnut.2025.10.013\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"NUTRITION & DIETETICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Clinical Nutrition","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ajcnut.2025.10.013","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NUTRITION & DIETETICS","Score":null,"Total":0}

引用次数: 0

摘要

背景：使用多模态人工智能（AI）聊天机器人的基于图像的自动饮食评估（IBDAs）显示出强大的潜力。然而，在实际使用中，人机界面的错误来源仍不清楚。目的：在本研究中，我们验证了gpt - 40驱动的自动ibda聊天机器人，并确定了自由生活环境中的主要错误来源。方法：在171天的时间里，从57名年轻成年人的3天称重食物记录（WFRs）中收集了714张食物图像。使用四种AI配置对图像进行分析：Diabot （DB）、DBFN（定制gpt - 40）、40和4oFN（非定制），其中“FN”表示包含食品名称输入。采用Bland-Altman图，在±10%、±15%和±20%界限下进行等效检验，比较份量和营养估计值与WFRs。结果：仅使用图像，DB识别了74%的食物，而40种食物识别了59%。所有人工智能配置都提供了准确的分量估计（±10%-15%，变异系数（CV）： 13%）、能量（±10%-20%,CV: 14%）和碳水化合物（±15%-20%,CV: 15%），但对脂肪（±10%-22%,CV: 24%）和蛋白质（±10%->20.2%,CV: 18%）的一致性较差。定制DBFN优于4oFN，在±10%（体重、能量、脂肪、饱和脂肪、钾和镁）、±15%（蛋白质和钠）和±20%（碳水化合物和钙）范围内的更多营养物质上获得更高的准确性，并与wfr达到最高的一致性（Spearman ρ = 0.863-0.662； Lin的一致性相关系数= 0.874-0.540）。人机界面的常见错误包括不准确的分量估计，图像中模糊的食物可见性，构造不良的提示，遗漏或入侵错误，以及系统特定的限制，如处理过载和配置不一致。结论：定制的AI聊天机器人改善了自动化ibda，但准确性取决于食物可见性和部分大小保真度的清晰图像。标准化的人工智能输入程序（FN、烹饪状态、提示结构和配置）和专家监督来检测和纠正人工智能幻觉（捏造的物品、单位或数量）对于可靠、可解释的估计仍然至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Customized Multimodal Diabot-GPT-4o Enhances Accuracy of Image-Based Dietary Assessments in Dietetic Trainees in Taiwan: Validation Against Weighed Food Records.

Background: Automated image-based dietary assessments (IBDAs) using multimodal artificial intelligence (AI) chatbots show strong potential. However, sources of error at the human-AI interface in real-world use remain unclear.

Objective: In this study, we validated a GPT-4o-powered chatbot for automated IBDAs and identified key sources of error in free-living settings.

Methods: In total, 714 food images were collected from 3-day weighed food records (WFRs) across 171 days from 57 young adults. Images were analyzed using four AI configurations: Diabot (DB), DBFN (customized GPT-4o), 4o, and 4oFN (non-customized), where "FN" indicates inclusion of the food name input. Portion sizes and nutrient estimates were compared to WFRs using Bland-Altman plots with equivalence testing at ±10%, ±15%, and ±20% bounds.

Results: Using images alone, DB recognized 74% of food items versus 59% for 4o. All AI configurations provided accurate estimates of portion sizes (±10%-15%, coefficient of variation (CV): 13%), energy (±10%-20%, CV: 14%), and carbohydrates (±15%-20%, CV: 15%), but showed less consistency for fats (±10%-22%, CV: 24%) and proteins (±10%->20.2%, CV: 18%). The custom DBFN outperformed 4oFN, achieving higher accuracy across more nutrients within the ±10% (weight, energy, fats, saturated fats, potassium, and magnesium), ±15% (proteins and sodium), and ±20% (carbohydrates and calcium) bounds and achieved the highest agreement with WFRs (Spearman ρ = 0.863-0.662; Lin's concordance correlation coefficient = 0.874-0.540). Common errors at the human-AI interface included inaccurate portion size estimates, obscured food visibility in images, poorly constructed prompts, omission or intrusion errors, and system-specific limitations such as processing overload and configuration inconsistencies.

Conclusions: Customized AI chatbots improved automated IBDAs, yet accuracy depends on clear images for food visibility and portion-size fidelity. Standardized AI-input procedures (FN, cooking state, prompt structure, and configuration) and expert oversight to detect and correct AI hallucinations (fabricated items, units, or quantities) remain essential for reliable, interpretable estimates.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

American Journal of Clinical Nutrition 医学-营养学

CiteScore

12.40

自引率

4.20%

发文量

332

审稿时长

38 days

期刊介绍： American Journal of Clinical Nutrition is recognized as the most highly rated peer-reviewed, primary research journal in nutrition and dietetics.It focuses on publishing the latest research on various topics in nutrition, including but not limited to obesity, vitamins and minerals, nutrition and disease, and energy metabolism. Purpose: The purpose of AJCN is to: Publish original research studies relevant to human and clinical nutrition. Consider well-controlled clinical studies describing scientific mechanisms, efficacy, and safety of dietary interventions in the context of disease prevention or health benefits. Encourage public health and epidemiologic studies relevant to human nutrition. Promote innovative investigations of nutritional questions employing epigenetic, genomic, proteomic, and metabolomic approaches. Include solicited editorials, book reviews, solicited or unsolicited review articles, invited controversy position papers, and letters to the Editor related to prior AJCN articles. Peer Review Process: All submitted material with scientific content undergoes peer review by the Editors or their designees before acceptance for publication.