Accuracy of Current Large Language Models and The Retrieval Augmented Generation Model in Determining Dietary Principles in Chronic Kidney Disease.

IF 3.4 3区 医学 Q2 NUTRITION & DIETETICS
Feray Gençer Bingöl, Duygu Ağagündüz, Mustafa Can Bingöl
{"title":"Accuracy of Current Large Language Models and The Retrieval Augmented Generation Model in Determining Dietary Principles in Chronic Kidney Disease.","authors":"Feray Gençer Bingöl, Duygu Ağagündüz, Mustafa Can Bingöl","doi":"10.1053/j.jrn.2025.01.004","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Large Language Models (LLMs) have emerged as powerful tools with significant potential for quickly accessing information in the nutrition and health, as in many fields. Retrieval augmented generation (RAG) has been included among artificial intelligence (AI) powered chatbot structures as a framework developed to increase the accuracy and ability of LLMs. This study aimed to evaluate the accuracy of LLMs (GPT4, Gemini, and Llama) and RAG in determining dietary principles in chronic kidney disease.</p><p><strong>Design and methods: </strong>The nutrition guideline published by the National Kidney Foundation in 2020 was used as an external information source in developed RAG model. Answers were obtained using 12 medical nutritional therapy prompts for CKD by four chatbots. The accuracy of the 48 answers generated by the chatbots was evaluated with a 5-point Likert scale.</p><p><strong>Results: </strong>The results showed that Gemini and RAG had the highest accuracy scores (median:4.0), followed by GPT4 (median: 2.5) and Llama (median: 1.5), respectively. When the accuracy scores were examined between the two chatbots, a significant difference was detected between all groups except Gemini and RAG.</p><p><strong>Conclusion: </strong>These chatbots produced both completely correct answers and false information with potentially harmful clinical outcomes. Customization of LLMs in specific areas such as nutrition or the development of a nutrition-specific RAG framework by improving LLM structures with current guidelines and articles may be an important strategy to increase the accuracy of AI powered chatbots.</p>","PeriodicalId":50066,"journal":{"name":"Journal of Renal Nutrition","volume":" ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Renal Nutrition","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1053/j.jrn.2025.01.004","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NUTRITION & DIETETICS","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: Large Language Models (LLMs) have emerged as powerful tools with significant potential for quickly accessing information in the nutrition and health, as in many fields. Retrieval augmented generation (RAG) has been included among artificial intelligence (AI) powered chatbot structures as a framework developed to increase the accuracy and ability of LLMs. This study aimed to evaluate the accuracy of LLMs (GPT4, Gemini, and Llama) and RAG in determining dietary principles in chronic kidney disease.

Design and methods: The nutrition guideline published by the National Kidney Foundation in 2020 was used as an external information source in developed RAG model. Answers were obtained using 12 medical nutritional therapy prompts for CKD by four chatbots. The accuracy of the 48 answers generated by the chatbots was evaluated with a 5-point Likert scale.

Results: The results showed that Gemini and RAG had the highest accuracy scores (median:4.0), followed by GPT4 (median: 2.5) and Llama (median: 1.5), respectively. When the accuracy scores were examined between the two chatbots, a significant difference was detected between all groups except Gemini and RAG.

Conclusion: These chatbots produced both completely correct answers and false information with potentially harmful clinical outcomes. Customization of LLMs in specific areas such as nutrition or the development of a nutrition-specific RAG framework by improving LLM structures with current guidelines and articles may be an important strategy to increase the accuracy of AI powered chatbots.

求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Renal Nutrition
Journal of Renal Nutrition 医学-泌尿学与肾脏学
CiteScore
5.70
自引率
12.50%
发文量
146
审稿时长
6.7 weeks
期刊介绍: The Journal of Renal Nutrition is devoted exclusively to renal nutrition science and renal dietetics. Its content is appropriate for nutritionists, physicians and researchers working in nephrology. Each issue contains a state-of-the-art review, original research, articles on the clinical management and education of patients, a current literature review, and nutritional analysis of food products that have clinical relevance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信