Comparison of ChatGPT-4, Microsoft Copilot, and Google Gemini for Pediatric Ophthalmology Questions.

IF 0.9 4区医学 Q4 OPHTHALMOLOGY

Journal of Pediatric Ophthalmology & Strabismus Pub Date : 2025-05-27 DOI:10.3928/01913913-20250404-03

Tevfik Serhat Bahar, Olgar Öcal, Asli Çetinkaya Yaprak

{"title":"Comparison of ChatGPT-4, Microsoft Copilot, and Google Gemini for Pediatric Ophthalmology Questions.","authors":"Tevfik Serhat Bahar, Olgar Öcal, Asli Çetinkaya Yaprak","doi":"10.3928/01913913-20250404-03","DOIUrl":null,"url":null,"abstract":"Purpose: To evaluate the success of Chat Generative Pre-trained Transformer (ChatGPT; OpenAl), Google Gemini (Alphabet, Inc), and Microsoft Copilot (Microsoft Corporation) artificial intelligence (AI) programs, which are offered free of charge by three different manufacturers, in answering questions related to pediatric ophthalmology correctly and to investigate whether they are superior to each other.Methods: ChatGPT, Gemini, and Copilot were each asked 100 multiple-choice questions from the Ophtho-Questions online question bank, which is widely used for preparing for the high-stakes Ophthalmic Knowledge Evaluation Program examination. Their answers were compared to the official answer keys and categorized as correct or incorrect. The readability of the responses was assessed using the Flesch-Kincaid Grade Level, Flesch Reading Ease Score, and the Coleman-Liau Index.Results: ChatGPT, Gemini, and Copilot chatbots answered 61 (61%), 60 (60%), and 74 (74%) questions correctly, respectively. The Copilot AI program had a significantly higher rate of correct answers to questions than ChatGPT and Gemini (P = .049 and .035). Three readability analyses revealed that Copilot had the highest average score, followed by ChatGPT and Gemini, which were more challenging than the recommended level.Conclusions: Although AI chatbots can serve as useful tools for acquiring information on pediatric ophthalmology, their responses should be interpreted with caution due to potential inaccuracies. [J Pediatr Ophthalmol Strabismus. 20XX;X(X):XXX-XXX.].","PeriodicalId":50095,"journal":{"name":"Journal of Pediatric Ophthalmology & Strabismus","volume":" ","pages":"1-7"},"PeriodicalIF":0.9000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Ophthalmology & Strabismus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01913913-20250404-03","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: To evaluate the success of Chat Generative Pre-trained Transformer (ChatGPT; OpenAl), Google Gemini (Alphabet, Inc), and Microsoft Copilot (Microsoft Corporation) artificial intelligence (AI) programs, which are offered free of charge by three different manufacturers, in answering questions related to pediatric ophthalmology correctly and to investigate whether they are superior to each other.

Methods: ChatGPT, Gemini, and Copilot were each asked 100 multiple-choice questions from the Ophtho-Questions online question bank, which is widely used for preparing for the high-stakes Ophthalmic Knowledge Evaluation Program examination. Their answers were compared to the official answer keys and categorized as correct or incorrect. The readability of the responses was assessed using the Flesch-Kincaid Grade Level, Flesch Reading Ease Score, and the Coleman-Liau Index.

Results: ChatGPT, Gemini, and Copilot chatbots answered 61 (61%), 60 (60%), and 74 (74%) questions correctly, respectively. The Copilot AI program had a significantly higher rate of correct answers to questions than ChatGPT and Gemini (P = .049 and .035). Three readability analyses revealed that Copilot had the highest average score, followed by ChatGPT and Gemini, which were more challenging than the recommended level.

Conclusions: Although AI chatbots can serve as useful tools for acquiring information on pediatric ophthalmology, their responses should be interpreted with caution due to potential inaccuracies. [J Pediatr Ophthalmol Strabismus. 20XX;X(X):XXX-XXX.].

查看原文本刊更多论文

ChatGPT-4、Microsoft Copilot和谷歌Gemini在儿童眼科问题中的比较

目的：评价聊天生成预训练变压器（ChatGPT）的有效性；OpenAl)、谷歌Gemini （Alphabet公司）、Microsoft Copilot（微软公司）等三家不同厂商免费提供的人工智能（AI）程序，在正确回答儿童眼科相关问题，并调查它们之间是否存在优势。方法：ChatGPT、Gemini和Copilot各被要求回答100道选择题，这些选择题来自于广泛用于高风险眼科知识评估项目考试的opho - questions在线题库。他们的答案将与官方答案进行比较，并按正确或错误进行分类。使用Flesch- kincaid Grade Level、Flesch Reading Ease Score和Coleman-Liau Index来评估问卷的可读性。结果：ChatGPT、Gemini和Copilot聊天机器人分别正确回答了61个（61%）、60个（60%）和74个（74%）问题。与ChatGPT和Gemini相比，Copilot人工智能程序对问题的正确率明显更高（P = 0.049和0.035）。三项可读性分析显示，Copilot的平均得分最高，其次是ChatGPT和Gemini，它们比推荐的水平更具挑战性。结论：尽管人工智能聊天机器人可以作为获取儿童眼科信息的有用工具，但由于可能存在不准确性，应谨慎解读它们的回答。[J].儿童眼斜视，2009；X(X):XXX-XXX。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Pediatric Ophthalmology & Strabismus 医学-小儿科

CiteScore

1.80

自引率

8.30%

发文量

115

审稿时长

>12 weeks

期刊介绍： The Journal of Pediatric Ophthalmology & Strabismus is a bimonthly peer-reviewed publication for pediatric ophthalmologists. The Journal has published original articles on the diagnosis, treatment, and prevention of eye disorders in the pediatric age group and the treatment of strabismus in all age groups for over 50 years.