Accuracy of Current Large Language Models and the Retrieval-Augmented Generation Model in Determining Dietary Principles in Chronic Kidney Disease.

IF 3.4 3区医学 Q2 NUTRITION & DIETETICS

Journal of Renal Nutrition Pub Date : 2025-01-24 DOI:10.1053/j.jrn.2025.01.004

Feray Gençer Bingöl, Duygu Ağagündüz, Mustafa Can Bingol

{"title":"Accuracy of Current Large Language Models and the Retrieval-Augmented Generation Model in Determining Dietary Principles in Chronic Kidney Disease.","authors":"Feray Gençer Bingöl, Duygu Ağagündüz, Mustafa Can Bingol","doi":"10.1053/j.jrn.2025.01.004","DOIUrl":null,"url":null,"abstract":"Objective: Large language models (LLMs) have emerged as powerful tools with significant potential for quickly accessing information in the nutrition and health, as in many fields. Retrieval-augmented generation (RAG) has been included among artificial intelligence (AI) powered chatbot structures as a framework developed to increase the accuracy and ability of LLMs. This study aimed to evaluate the accuracy of LLMs (Generative Pre-trained Transformer 4, Gemini, and Llama) and RAG in determining dietary principles in chronic kidney disease.Design and methods: The nutrition guideline published by the National Kidney Foundation in 2020 was used as an external information source in developed RAG model. Answers were obtained using 12 medical nutritional therapy prompts for chronic kidney disease by four chatbots. The accuracy of the 48 answers generated by the chatbots was evaluated with a 5-point Likert scale.Results: The results showed that Gemini and RAG had the highest accuracy scores (median: 4.0), followed by Generative Pre-trained Transformer 4 (median: 2.5) and Llama (median: 1.5), respectively. When the accuracy scores were examined between the two chatbots, a significant difference was detected between all groups except Gemini and RAG.Conclusion: These chatbots produced both completely correct answers and false information with potentially harmful clinical outcomes. Customization of LLMs in specific areas such as nutrition or the development of a nutrition-specific RAG framework by improving LLM structures with current guidelines and articles may be an important strategy to increase the accuracy of AI powered chatbots.","PeriodicalId":50066,"journal":{"name":"Journal of Renal Nutrition","volume":" ","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Renal Nutrition","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1053/j.jrn.2025.01.004","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"NUTRITION & DIETETICS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: Large language models (LLMs) have emerged as powerful tools with significant potential for quickly accessing information in the nutrition and health, as in many fields. Retrieval-augmented generation (RAG) has been included among artificial intelligence (AI) powered chatbot structures as a framework developed to increase the accuracy and ability of LLMs. This study aimed to evaluate the accuracy of LLMs (Generative Pre-trained Transformer 4, Gemini, and Llama) and RAG in determining dietary principles in chronic kidney disease.

Design and methods: The nutrition guideline published by the National Kidney Foundation in 2020 was used as an external information source in developed RAG model. Answers were obtained using 12 medical nutritional therapy prompts for chronic kidney disease by four chatbots. The accuracy of the 48 answers generated by the chatbots was evaluated with a 5-point Likert scale.

Results: The results showed that Gemini and RAG had the highest accuracy scores (median: 4.0), followed by Generative Pre-trained Transformer 4 (median: 2.5) and Llama (median: 1.5), respectively. When the accuracy scores were examined between the two chatbots, a significant difference was detected between all groups except Gemini and RAG.

Conclusion: These chatbots produced both completely correct answers and false information with potentially harmful clinical outcomes. Customization of LLMs in specific areas such as nutrition or the development of a nutrition-specific RAG framework by improving LLM structures with current guidelines and articles may be an important strategy to increase the accuracy of AI powered chatbots.

查看原文本刊更多论文

当前大型语言模型和检索增强生成模型在确定慢性肾脏疾病饮食原则中的准确性。

目的：大型语言模型（LLMs）已经成为一种强大的工具，在营养和健康等许多领域具有快速获取信息的巨大潜力。检索增强生成（RAG）已被纳入人工智能（AI）驱动的聊天机器人结构中，作为提高llm准确性和能力的框架而开发。本研究旨在评估LLMs （GPT4、Gemini和Llama）和RAG在确定慢性肾脏疾病饮食原则方面的准确性。设计与方法：采用国家肾脏基金会2020年发布的营养指南作为外部信息源，建立RAG模型。通过四个聊天机器人使用12个CKD医学营养治疗提示获得答案。由聊天机器人生成的48个答案的准确性用5分李克特量表进行了评估。结果：结果显示，Gemini和RAG的准确率得分最高（中位数：4.0），其次是GPT4（中位数：2.5）和Llama（中位数：1.5）。当检查两个聊天机器人之间的准确性分数时，除了Gemini和RAG之外，所有组之间都存在显着差异。结论：这些聊天机器人既能给出完全正确的答案，也能给出可能有害临床结果的错误信息。定制特定领域的法学硕士（如营养学）或通过使用当前指南和文章改进法学硕士结构来开发营养特定的RAG框架，可能是提高人工智能聊天机器人准确性的重要策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Renal Nutrition 医学-泌尿学与肾脏学

CiteScore

5.70

自引率

12.50%

发文量

146

审稿时长

6.7 weeks

期刊介绍： The Journal of Renal Nutrition is devoted exclusively to renal nutrition science and renal dietetics. Its content is appropriate for nutritionists, physicians and researchers working in nephrology. Each issue contains a state-of-the-art review, original research, articles on the clinical management and education of patients, a current literature review, and nutritional analysis of food products that have clinical relevance.