{"title":"Use of generative large language models for patient education on common surgical conditions: a comparative analysis between ChatGPT and Google Gemini.","authors":"Omar Mahmoud ELSenbawy, Keval Bhavesh Patel, Randev Ayodhya Wannakuwatte, Akhila N Thota","doi":"10.1007/s13304-025-02074-8","DOIUrl":null,"url":null,"abstract":"<p><p>There is a growing importance for patients to easily access information regarding their medical conditions to improve their understanding and participation in health care decisions. Artificial Intelligence (AI) has proven as a fast, efficient, and effective tool in educating patients regarding their health care conditions. The aim of the study is to compare the responses provided by AI tools, ChatGPT and Google Gemini, to assess for conciseness and understandability of information provided for the medical conditions Deep vein thrombosis, decubitus ulcers, and hemorrhoids. A cross-sectional original research design was conducted regarding the responses generated by ChatGPT and Google Gemini for the post-surgical complications of Deep vein thrombosis, decubitus ulcers, and hemorrhoids. Each response was evaluated by the Flesch-Kincaid calculator for total number of words, sentences, average words per sentence, average syllables per word, grade level, and ease score. Additionally, the similarity score was evaluated using QuillBot and reliability using a modified discern score. These results were then analyzed by the unpaired or two sample t-test to compare the averages between the two AI tools to conclude which one was superior. Chat GPT required a higher education level to understand as suggested by the higher grade levels and lower ease scores. The easiest brochure was for deep vein thrombosis which had the lowest ease score and highest grade level. ChatGPT displayed more similarity with information provided on the internet as calculated by the plagiarism calculator-Quill bot. The reliability score via the Modified Discern score showing both AI tools were similar. Although there is a difference in the various scores for each AI tool, based on the P values obtained there is not enough evidence to conclude the superiority of one AI tool over the other.</p>","PeriodicalId":23391,"journal":{"name":"Updates in Surgery","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Updates in Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s13304-025-02074-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0
Abstract
There is a growing importance for patients to easily access information regarding their medical conditions to improve their understanding and participation in health care decisions. Artificial Intelligence (AI) has proven as a fast, efficient, and effective tool in educating patients regarding their health care conditions. The aim of the study is to compare the responses provided by AI tools, ChatGPT and Google Gemini, to assess for conciseness and understandability of information provided for the medical conditions Deep vein thrombosis, decubitus ulcers, and hemorrhoids. A cross-sectional original research design was conducted regarding the responses generated by ChatGPT and Google Gemini for the post-surgical complications of Deep vein thrombosis, decubitus ulcers, and hemorrhoids. Each response was evaluated by the Flesch-Kincaid calculator for total number of words, sentences, average words per sentence, average syllables per word, grade level, and ease score. Additionally, the similarity score was evaluated using QuillBot and reliability using a modified discern score. These results were then analyzed by the unpaired or two sample t-test to compare the averages between the two AI tools to conclude which one was superior. Chat GPT required a higher education level to understand as suggested by the higher grade levels and lower ease scores. The easiest brochure was for deep vein thrombosis which had the lowest ease score and highest grade level. ChatGPT displayed more similarity with information provided on the internet as calculated by the plagiarism calculator-Quill bot. The reliability score via the Modified Discern score showing both AI tools were similar. Although there is a difference in the various scores for each AI tool, based on the P values obtained there is not enough evidence to conclude the superiority of one AI tool over the other.
期刊介绍:
Updates in Surgery (UPIS) has been founded in 2010 as the official journal of the Italian Society of Surgery. It’s an international, English-language, peer-reviewed journal dedicated to the surgical sciences. Its main goal is to offer a valuable update on the most recent developments of those surgical techniques that are rapidly evolving, forcing the community of surgeons to a rigorous debate and a continuous refinement of standards of care. In this respect position papers on the mostly debated surgical approaches and accreditation criteria have been published and are welcome for the future.
Beside its focus on general surgery, the journal draws particular attention to cutting edge topics and emerging surgical fields that are publishing in monothematic issues guest edited by well-known experts.
Updates in Surgery has been considering various types of papers: editorials, comprehensive reviews, original studies and technical notes related to specific surgical procedures and techniques on liver, colorectal, gastric, pancreatic, robotic and bariatric surgery.