{"title":"Evaluation of artificial intelligence-based patient education models for irritable bowel syndrome.","authors":"Anand Kumar Raghavendran, Balaji Musunuri, Siddheesh Rajpurohit, Ganesh Pai C, Shiran Shetty, Pretty Kumari, Rakshand Shetty, Athish Shetty, Ganesh Bhat","doi":"10.1007/s12664-025-01872-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Irritable bowel syndrome (IBS) is a common functional gastrointestinal disorder with a significant psycho-social burden. Despite medical advancements, patient education on IBS remains inadequate. This study compared two large language models (LLMs)-ChatGPT-4 and Gemini-1-for their performance in addressing IBS-related patient queries.</p><p><strong>Methods: </strong>Thirty-nine IBS-related frequently asked questions (FAQs) from IBS organizations and hospital websites were categorized into six domains: general understanding, symptoms and diagnosis, causes, dietary considerations, treatment and lifestyle factors. Responses from ChatGPT-4 and Gemini-1 were evaluated by two independent gastroenterologists for comprehensiveness and accuracy, with a third reviewer resolving disagreements. Readability was measured using five standardized indices (Flesch Reading Ease [FRE], Simple Measure of Gobbledygook [SMOG], Gunning Fog Index [GFI], Automated Readability Index [ARI], Reading Level Consensus [ARC]) and empathy was rated on a 4-point Likert scale by three reviewers.</p><p><strong>Results: </strong>Gemini produced comprehensive and accurate answers for 94.9% (37/39) of questions, with two rated as mixed (vague/outdated). ChatGPT achieved 89.7% (35/39) comprehensive responses, with four rated mixed. Domain-wise, both models performed best in \"symptoms and diagnosis\" and \"treatment\", while mixed responses were most frequent in \"general understanding\" and \"lifestyle\". There was no significant difference in comprehensiveness (p = 0.67). Readability analysis showed both LLMs generated difficult-to-read content: Gemini's FRE score was 35.83 ± 3.31 vs. ChatGPT's 32.33 ± 5.57 (p = 0.21), corresponding to college-level proficiency. ChatGPT's responses were more empathetic, with all responses rated moderately empathetic; Gemini was mostly rated minimally empathetic (66.7%).</p><p><strong>Conclusion: </strong>While ChatGPT and Gemini provided extensive information, their limitations-such as complex language and occasional inaccuracies-must be addressed. Future improvements should focus on enhancing readability, contextual relevance and accuracy to better meet the diverse needs of patients and clinicians.</p>","PeriodicalId":13404,"journal":{"name":"Indian Journal of Gastroenterology","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indian Journal of Gastroenterology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s12664-025-01872-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Irritable bowel syndrome (IBS) is a common functional gastrointestinal disorder with a significant psycho-social burden. Despite medical advancements, patient education on IBS remains inadequate. This study compared two large language models (LLMs)-ChatGPT-4 and Gemini-1-for their performance in addressing IBS-related patient queries.
Methods: Thirty-nine IBS-related frequently asked questions (FAQs) from IBS organizations and hospital websites were categorized into six domains: general understanding, symptoms and diagnosis, causes, dietary considerations, treatment and lifestyle factors. Responses from ChatGPT-4 and Gemini-1 were evaluated by two independent gastroenterologists for comprehensiveness and accuracy, with a third reviewer resolving disagreements. Readability was measured using five standardized indices (Flesch Reading Ease [FRE], Simple Measure of Gobbledygook [SMOG], Gunning Fog Index [GFI], Automated Readability Index [ARI], Reading Level Consensus [ARC]) and empathy was rated on a 4-point Likert scale by three reviewers.
Results: Gemini produced comprehensive and accurate answers for 94.9% (37/39) of questions, with two rated as mixed (vague/outdated). ChatGPT achieved 89.7% (35/39) comprehensive responses, with four rated mixed. Domain-wise, both models performed best in "symptoms and diagnosis" and "treatment", while mixed responses were most frequent in "general understanding" and "lifestyle". There was no significant difference in comprehensiveness (p = 0.67). Readability analysis showed both LLMs generated difficult-to-read content: Gemini's FRE score was 35.83 ± 3.31 vs. ChatGPT's 32.33 ± 5.57 (p = 0.21), corresponding to college-level proficiency. ChatGPT's responses were more empathetic, with all responses rated moderately empathetic; Gemini was mostly rated minimally empathetic (66.7%).
Conclusion: While ChatGPT and Gemini provided extensive information, their limitations-such as complex language and occasional inaccuracies-must be addressed. Future improvements should focus on enhancing readability, contextual relevance and accuracy to better meet the diverse needs of patients and clinicians.
期刊介绍:
The Indian Journal of Gastroenterology aims to help doctors everywhere practise better medicine and to influence the debate on gastroenterology. To achieve these aims, we publish original scientific studies, state-of -the-art special articles, reports and papers commenting on the clinical, scientific and public health factors affecting aspects of gastroenterology. We shall be delighted to receive articles for publication in all of these categories and letters commenting on the contents of the Journal or on issues of interest to our readers.