Customized Multimodal Diabot-GPT-4o Enhances Accuracy of Image-Based Dietary Assessments in Dietetic Trainees in Taiwan: Validation Against Weighed Food Records.
Yu Jie Chen, Chun-Chao Chang, Yen Nhi Hoang, Annie W Lin, Wen-Ling Lin, Cheng-Yu Lin, Ellyn Patricia, Janice Clarisa Tissadharma, Jovan Kuanca, Natasya Nobelta, Kimberly Alecia Theo, Dang Khanh Ngan Ho, Pin-Hui Wei, Jung-Su Chang
{"title":"Customized Multimodal Diabot-GPT-4o Enhances Accuracy of Image-Based Dietary Assessments in Dietetic Trainees in Taiwan: Validation Against Weighed Food Records.","authors":"Yu Jie Chen, Chun-Chao Chang, Yen Nhi Hoang, Annie W Lin, Wen-Ling Lin, Cheng-Yu Lin, Ellyn Patricia, Janice Clarisa Tissadharma, Jovan Kuanca, Natasya Nobelta, Kimberly Alecia Theo, Dang Khanh Ngan Ho, Pin-Hui Wei, Jung-Su Chang","doi":"10.1016/j.ajcnut.2025.10.013","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Automated image-based dietary assessments (IBDAs) using multimodal artificial intelligence (AI) chatbots show strong potential. However, sources of error at the human-AI interface in real-world use remain unclear.</p><p><strong>Objective: </strong>In this study, we validated a GPT-4o-powered chatbot for automated IBDAs and identified key sources of error in free-living settings.</p><p><strong>Methods: </strong>In total, 714 food images were collected from 3-day weighed food records (WFRs) across 171 days from 57 young adults. Images were analyzed using four AI configurations: Diabot (DB), DBFN (customized GPT-4o), 4o, and 4oFN (non-customized), where \"FN\" indicates inclusion of the food name input. Portion sizes and nutrient estimates were compared to WFRs using Bland-Altman plots with equivalence testing at ±10%, ±15%, and ±20% bounds.</p><p><strong>Results: </strong>Using images alone, DB recognized 74% of food items versus 59% for 4o. All AI configurations provided accurate estimates of portion sizes (±10%-15%, coefficient of variation (CV): 13%), energy (±10%-20%, CV: 14%), and carbohydrates (±15%-20%, CV: 15%), but showed less consistency for fats (±10%-22%, CV: 24%) and proteins (±10%->20.2%, CV: 18%). The custom DBFN outperformed 4oFN, achieving higher accuracy across more nutrients within the ±10% (weight, energy, fats, saturated fats, potassium, and magnesium), ±15% (proteins and sodium), and ±20% (carbohydrates and calcium) bounds and achieved the highest agreement with WFRs (Spearman ρ = 0.863-0.662; Lin's concordance correlation coefficient = 0.874-0.540). Common errors at the human-AI interface included inaccurate portion size estimates, obscured food visibility in images, poorly constructed prompts, omission or intrusion errors, and system-specific limitations such as processing overload and configuration inconsistencies.</p><p><strong>Conclusions: </strong>Customized AI chatbots improved automated IBDAs, yet accuracy depends on clear images for food visibility and portion-size fidelity. Standardized AI-input procedures (FN, cooking state, prompt structure, and configuration) and expert oversight to detect and correct AI hallucinations (fabricated items, units, or quantities) remain essential for reliable, interpretable estimates.</p>","PeriodicalId":50813,"journal":{"name":"American Journal of Clinical Nutrition","volume":" ","pages":""},"PeriodicalIF":6.9000,"publicationDate":"2025-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Clinical Nutrition","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.ajcnut.2025.10.013","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"NUTRITION & DIETETICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Automated image-based dietary assessments (IBDAs) using multimodal artificial intelligence (AI) chatbots show strong potential. However, sources of error at the human-AI interface in real-world use remain unclear.
Objective: In this study, we validated a GPT-4o-powered chatbot for automated IBDAs and identified key sources of error in free-living settings.
Methods: In total, 714 food images were collected from 3-day weighed food records (WFRs) across 171 days from 57 young adults. Images were analyzed using four AI configurations: Diabot (DB), DBFN (customized GPT-4o), 4o, and 4oFN (non-customized), where "FN" indicates inclusion of the food name input. Portion sizes and nutrient estimates were compared to WFRs using Bland-Altman plots with equivalence testing at ±10%, ±15%, and ±20% bounds.
Results: Using images alone, DB recognized 74% of food items versus 59% for 4o. All AI configurations provided accurate estimates of portion sizes (±10%-15%, coefficient of variation (CV): 13%), energy (±10%-20%, CV: 14%), and carbohydrates (±15%-20%, CV: 15%), but showed less consistency for fats (±10%-22%, CV: 24%) and proteins (±10%->20.2%, CV: 18%). The custom DBFN outperformed 4oFN, achieving higher accuracy across more nutrients within the ±10% (weight, energy, fats, saturated fats, potassium, and magnesium), ±15% (proteins and sodium), and ±20% (carbohydrates and calcium) bounds and achieved the highest agreement with WFRs (Spearman ρ = 0.863-0.662; Lin's concordance correlation coefficient = 0.874-0.540). Common errors at the human-AI interface included inaccurate portion size estimates, obscured food visibility in images, poorly constructed prompts, omission or intrusion errors, and system-specific limitations such as processing overload and configuration inconsistencies.
Conclusions: Customized AI chatbots improved automated IBDAs, yet accuracy depends on clear images for food visibility and portion-size fidelity. Standardized AI-input procedures (FN, cooking state, prompt structure, and configuration) and expert oversight to detect and correct AI hallucinations (fabricated items, units, or quantities) remain essential for reliable, interpretable estimates.
期刊介绍:
American Journal of Clinical Nutrition is recognized as the most highly rated peer-reviewed, primary research journal in nutrition and dietetics.It focuses on publishing the latest research on various topics in nutrition, including but not limited to obesity, vitamins and minerals, nutrition and disease, and energy metabolism.
Purpose:
The purpose of AJCN is to:
Publish original research studies relevant to human and clinical nutrition.
Consider well-controlled clinical studies describing scientific mechanisms, efficacy, and safety of dietary interventions in the context of disease prevention or health benefits.
Encourage public health and epidemiologic studies relevant to human nutrition.
Promote innovative investigations of nutritional questions employing epigenetic, genomic, proteomic, and metabolomic approaches.
Include solicited editorials, book reviews, solicited or unsolicited review articles, invited controversy position papers, and letters to the Editor related to prior AJCN articles.
Peer Review Process:
All submitted material with scientific content undergoes peer review by the Editors or their designees before acceptance for publication.