Use of generative large language models for patient education on common surgical conditions: a comparative analysis between ChatGPT and Google Gemini.

IF 2.4 3区医学 Q2 SURGERY

Updates in Surgery Pub Date : 2025-01-15 DOI:10.1007/s13304-025-02074-8

Omar Mahmoud ELSenbawy, Keval Bhavesh Patel, Randev Ayodhya Wannakuwatte, Akhila N Thota

{"title":"Use of generative large language models for patient education on common surgical conditions: a comparative analysis between ChatGPT and Google Gemini.","authors":"Omar Mahmoud ELSenbawy, Keval Bhavesh Patel, Randev Ayodhya Wannakuwatte, Akhila N Thota","doi":"10.1007/s13304-025-02074-8","DOIUrl":null,"url":null,"abstract":"<p><p>There is a growing importance for patients to easily access information regarding their medical conditions to improve their understanding and participation in health care decisions. Artificial Intelligence (AI) has proven as a fast, efficient, and effective tool in educating patients regarding their health care conditions. The aim of the study is to compare the responses provided by AI tools, ChatGPT and Google Gemini, to assess for conciseness and understandability of information provided for the medical conditions Deep vein thrombosis, decubitus ulcers, and hemorrhoids. A cross-sectional original research design was conducted regarding the responses generated by ChatGPT and Google Gemini for the post-surgical complications of Deep vein thrombosis, decubitus ulcers, and hemorrhoids. Each response was evaluated by the Flesch-Kincaid calculator for total number of words, sentences, average words per sentence, average syllables per word, grade level, and ease score. Additionally, the similarity score was evaluated using QuillBot and reliability using a modified discern score. These results were then analyzed by the unpaired or two sample t-test to compare the averages between the two AI tools to conclude which one was superior. Chat GPT required a higher education level to understand as suggested by the higher grade levels and lower ease scores. The easiest brochure was for deep vein thrombosis which had the lowest ease score and highest grade level. ChatGPT displayed more similarity with information provided on the internet as calculated by the plagiarism calculator-Quill bot. The reliability score via the Modified Discern score showing both AI tools were similar. Although there is a difference in the various scores for each AI tool, based on the P values obtained there is not enough evidence to conclude the superiority of one AI tool over the other.</p>","PeriodicalId":23391,"journal":{"name":"Updates in Surgery","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Updates in Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s13304-025-02074-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

Abstract

There is a growing importance for patients to easily access information regarding their medical conditions to improve their understanding and participation in health care decisions. Artificial Intelligence (AI) has proven as a fast, efficient, and effective tool in educating patients regarding their health care conditions. The aim of the study is to compare the responses provided by AI tools, ChatGPT and Google Gemini, to assess for conciseness and understandability of information provided for the medical conditions Deep vein thrombosis, decubitus ulcers, and hemorrhoids. A cross-sectional original research design was conducted regarding the responses generated by ChatGPT and Google Gemini for the post-surgical complications of Deep vein thrombosis, decubitus ulcers, and hemorrhoids. Each response was evaluated by the Flesch-Kincaid calculator for total number of words, sentences, average words per sentence, average syllables per word, grade level, and ease score. Additionally, the similarity score was evaluated using QuillBot and reliability using a modified discern score. These results were then analyzed by the unpaired or two sample t-test to compare the averages between the two AI tools to conclude which one was superior. Chat GPT required a higher education level to understand as suggested by the higher grade levels and lower ease scores. The easiest brochure was for deep vein thrombosis which had the lowest ease score and highest grade level. ChatGPT displayed more similarity with information provided on the internet as calculated by the plagiarism calculator-Quill bot. The reliability score via the Modified Discern score showing both AI tools were similar. Although there is a difference in the various scores for each AI tool, based on the P values obtained there is not enough evidence to conclude the superiority of one AI tool over the other.

查看原文本刊更多论文

使用生成式大型语言模型对常见手术条件进行患者教育：ChatGPT和谷歌Gemini的比较分析。

让患者方便地获取有关其医疗状况的信息，以增进他们对医疗保健决策的理解和参与，这一点越来越重要。人工智能（AI）已被证明是一种快速、高效和有效的工具，可以教育患者了解他们的医疗状况。该研究的目的是比较人工智能工具、ChatGPT和谷歌Gemini提供的响应，以评估为深静脉血栓形成、褥疮溃疡和痔疮等医疗状况提供的信息的简洁性和可理解性。对ChatGPT和谷歌Gemini对术后并发症深静脉血栓形成、褥疮溃疡、痔疮的反应进行横断面原创性研究设计。每个回答都用Flesch-Kincaid计算器对单词总数、句子数量、平均每句单词数量、平均每个单词的音节数量、年级水平和轻松得分进行评估。此外，使用QuillBot评估相似性评分，使用改进的辨别评分评估可靠性。然后通过未配对或双样本t检验对这些结果进行分析，以比较两种人工智能工具之间的平均值，以得出哪一种工具更优越。Chat GPT需要更高的教育水平才能理解，这意味着更高的年级水平和更低的轻松分数。深静脉血栓形成是最容易的，其易度评分最低，分级水平最高。通过剽窃计算器- quill bot计算，ChatGPT与互联网上提供的信息显示出更多的相似性。通过修改后的辨别分数得出的可靠性分数表明，这两种人工智能工具是相似的。尽管每个人工智能工具的各种得分存在差异，但根据所获得的P值，没有足够的证据来得出一个人工智能工具优于另一个的结论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Updates in Surgery Medicine-Surgery

CiteScore

4.50

自引率

7.70%

发文量

208

期刊介绍： Updates in Surgery (UPIS) has been founded in 2010 as the official journal of the Italian Society of Surgery. It’s an international, English-language, peer-reviewed journal dedicated to the surgical sciences. Its main goal is to offer a valuable update on the most recent developments of those surgical techniques that are rapidly evolving, forcing the community of surgeons to a rigorous debate and a continuous refinement of standards of care. In this respect position papers on the mostly debated surgical approaches and accreditation criteria have been published and are welcome for the future. Beside its focus on general surgery, the journal draws particular attention to cutting edge topics and emerging surgical fields that are publishing in monothematic issues guest edited by well-known experts. Updates in Surgery has been considering various types of papers: editorials, comprehensive reviews, original studies and technical notes related to specific surgical procedures and techniques on liver, colorectal, gastric, pancreatic, robotic and bariatric surgery.