Evaluation of artificial intelligence-based patient education models for irritable bowel syndrome.

IF 2.1 Q3 GASTROENTEROLOGY & HEPATOLOGY

Indian Journal of Gastroenterology Pub Date : 2025-09-20 DOI:10.1007/s12664-025-01872-7

Anand Kumar Raghavendran, Balaji Musunuri, Siddheesh Rajpurohit, Ganesh Pai C, Shiran Shetty, Pretty Kumari, Rakshand Shetty, Athish Shetty, Ganesh Bhat

{"title":"Evaluation of artificial intelligence-based patient education models for irritable bowel syndrome.","authors":"Anand Kumar Raghavendran, Balaji Musunuri, Siddheesh Rajpurohit, Ganesh Pai C, Shiran Shetty, Pretty Kumari, Rakshand Shetty, Athish Shetty, Ganesh Bhat","doi":"10.1007/s12664-025-01872-7","DOIUrl":null,"url":null,"abstract":"Background: Irritable bowel syndrome (IBS) is a common functional gastrointestinal disorder with a significant psycho-social burden. Despite medical advancements, patient education on IBS remains inadequate. This study compared two large language models (LLMs)-ChatGPT-4 and Gemini-1-for their performance in addressing IBS-related patient queries.Methods: Thirty-nine IBS-related frequently asked questions (FAQs) from IBS organizations and hospital websites were categorized into six domains: general understanding, symptoms and diagnosis, causes, dietary considerations, treatment and lifestyle factors. Responses from ChatGPT-4 and Gemini-1 were evaluated by two independent gastroenterologists for comprehensiveness and accuracy, with a third reviewer resolving disagreements. Readability was measured using five standardized indices (Flesch Reading Ease [FRE], Simple Measure of Gobbledygook [SMOG], Gunning Fog Index [GFI], Automated Readability Index [ARI], Reading Level Consensus [ARC]) and empathy was rated on a 4-point Likert scale by three reviewers.Results: Gemini produced comprehensive and accurate answers for 94.9% (37/39) of questions, with two rated as mixed (vague/outdated). ChatGPT achieved 89.7% (35/39) comprehensive responses, with four rated mixed. Domain-wise, both models performed best in \"symptoms and diagnosis\" and \"treatment\", while mixed responses were most frequent in \"general understanding\" and \"lifestyle\". There was no significant difference in comprehensiveness (p = 0.67). Readability analysis showed both LLMs generated difficult-to-read content: Gemini's FRE score was 35.83 ± 3.31 vs. ChatGPT's 32.33 ± 5.57 (p = 0.21), corresponding to college-level proficiency. ChatGPT's responses were more empathetic, with all responses rated moderately empathetic; Gemini was mostly rated minimally empathetic (66.7%).Conclusion: While ChatGPT and Gemini provided extensive information, their limitations-such as complex language and occasional inaccuracies-must be addressed. Future improvements should focus on enhancing readability, contextual relevance and accuracy to better meet the diverse needs of patients and clinicians.","PeriodicalId":13404,"journal":{"name":"Indian Journal of Gastroenterology","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indian Journal of Gastroenterology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12664-025-01872-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Irritable bowel syndrome (IBS) is a common functional gastrointestinal disorder with a significant psycho-social burden. Despite medical advancements, patient education on IBS remains inadequate. This study compared two large language models (LLMs)-ChatGPT-4 and Gemini-1-for their performance in addressing IBS-related patient queries.

Methods: Thirty-nine IBS-related frequently asked questions (FAQs) from IBS organizations and hospital websites were categorized into six domains: general understanding, symptoms and diagnosis, causes, dietary considerations, treatment and lifestyle factors. Responses from ChatGPT-4 and Gemini-1 were evaluated by two independent gastroenterologists for comprehensiveness and accuracy, with a third reviewer resolving disagreements. Readability was measured using five standardized indices (Flesch Reading Ease [FRE], Simple Measure of Gobbledygook [SMOG], Gunning Fog Index [GFI], Automated Readability Index [ARI], Reading Level Consensus [ARC]) and empathy was rated on a 4-point Likert scale by three reviewers.

Results: Gemini produced comprehensive and accurate answers for 94.9% (37/39) of questions, with two rated as mixed (vague/outdated). ChatGPT achieved 89.7% (35/39) comprehensive responses, with four rated mixed. Domain-wise, both models performed best in "symptoms and diagnosis" and "treatment", while mixed responses were most frequent in "general understanding" and "lifestyle". There was no significant difference in comprehensiveness (p = 0.67). Readability analysis showed both LLMs generated difficult-to-read content: Gemini's FRE score was 35.83 ± 3.31 vs. ChatGPT's 32.33 ± 5.57 (p = 0.21), corresponding to college-level proficiency. ChatGPT's responses were more empathetic, with all responses rated moderately empathetic; Gemini was mostly rated minimally empathetic (66.7%).

Conclusion: While ChatGPT and Gemini provided extensive information, their limitations-such as complex language and occasional inaccuracies-must be addressed. Future improvements should focus on enhancing readability, contextual relevance and accuracy to better meet the diverse needs of patients and clinicians.

查看原文本刊更多论文

基于人工智能的肠易激综合征患者教育模式评价

背景：肠易激综合征（IBS）是一种常见的功能性胃肠道疾病，具有显著的心理和社会负担。尽管医学进步，但对患者的肠易激综合症教育仍然不足。本研究比较了两种大型语言模型(llm)-ChatGPT-4和gemini -1在解决ibs相关患者查询方面的性能。方法：从IBS组织和医院网站上收集39个与IBS相关的常见问题（FAQs），分为六个领域：一般认识、症状和诊断、原因、饮食注意事项、治疗和生活方式因素。ChatGPT-4和Gemini-1的回复由两位独立的胃肠病学家评估其全面性和准确性，第三位审稿人解决分歧。可读性采用5个标准化指标（Flesch Reading Ease [FRE], Simple Measure of Gobbledygook [SMOG], Gunning Fog Index [GFI], Automated Readability Index [ARI], Reading Level Consensus [ARC]）来衡量，共情由3位审查员以4分李克特量表进行评分。结果：双子座对94.9%（37/39）的问题给出了全面准确的答案，其中2个问题被评为混合（模糊/过时）。ChatGPT达到89.7%（35/39）的综合反应，其中4个评价为混合。在领域方面，两个模型在“症状和诊断”和“治疗”方面表现最好，而在“一般理解”和“生活方式”方面的反应最为复杂。综合程度差异无统计学意义（p = 0.67）。可读性分析显示，两个LLMs都产生了难以阅读的内容：Gemini的FRE得分为35.83±3.31，ChatGPT的FRE得分为32.33±5.57 (p = 0.21)，与大学水平相当。ChatGPT的回答更具同理心，所有的回答都被评为中度同理心；大多数双子座被评为最低限度的同理心（66.7%）。结论：虽然ChatGPT和Gemini提供了广泛的信息，但它们的局限性——比如复杂的语言和偶尔的不准确——必须加以解决。未来的改进应侧重于提高可读性、上下文相关性和准确性，以更好地满足患者和临床医生的不同需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Indian Journal of Gastroenterology GASTROENTEROLOGY & HEPATOLOGY-

CiteScore

3.90

自引率

10.00%

发文量

期刊介绍： The Indian Journal of Gastroenterology aims to help doctors everywhere practise better medicine and to influence the debate on gastroenterology. To achieve these aims, we publish original scientific studies, state-of -the-art special articles, reports and papers commenting on the clinical, scientific and public health factors affecting aspects of gastroenterology. We shall be delighted to receive articles for publication in all of these categories and letters commenting on the contents of the Journal or on issues of interest to our readers.