Comparison of ChatGPT-4, Microsoft Copilot, and Google Gemini for Pediatric Ophthalmology Questions.

IF 1 4区 医学 Q4 OPHTHALMOLOGY
Tevfik Serhat Bahar, Olgar Öcal, Asli Çetinkaya Yaprak
{"title":"Comparison of ChatGPT-4, Microsoft Copilot, and Google Gemini for Pediatric Ophthalmology Questions.","authors":"Tevfik Serhat Bahar, Olgar Öcal, Asli Çetinkaya Yaprak","doi":"10.3928/01913913-20250404-03","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate the success of Chat Generative Pre-trained Transformer (ChatGPT; OpenAl), Google Gemini (Alphabet, Inc), and Microsoft Copilot (Microsoft Corporation) artificial intelligence (AI) programs, which are offered free of charge by three different manufacturers, in answering questions related to pediatric ophthalmology correctly and to investigate whether they are superior to each other.</p><p><strong>Methods: </strong>ChatGPT, Gemini, and Copilot were each asked 100 multiple-choice questions from the Ophtho-Questions online question bank, which is widely used for preparing for the high-stakes Ophthalmic Knowledge Evaluation Program examination. Their answers were compared to the official answer keys and categorized as correct or incorrect. The readability of the responses was assessed using the Flesch-Kincaid Grade Level, Flesch Reading Ease Score, and the Coleman-Liau Index.</p><p><strong>Results: </strong>ChatGPT, Gemini, and Copilot chatbots answered 61 (61%), 60 (60%), and 74 (74%) questions correctly, respectively. The Copilot AI program had a significantly higher rate of correct answers to questions than ChatGPT and Gemini (<i>P</i> = .049 and .035). Three readability analyses revealed that Copilot had the highest average score, followed by ChatGPT and Gemini, which were more challenging than the recommended level.</p><p><strong>Conclusions: </strong>Although AI chatbots can serve as useful tools for acquiring information on pediatric ophthalmology, their responses should be interpreted with caution due to potential inaccuracies. <b>[<i>J Pediatr Ophthalmol Strabismus</i>. 20XX;X(X):XXX-XXX.]</b>.</p>","PeriodicalId":50095,"journal":{"name":"Journal of Pediatric Ophthalmology & Strabismus","volume":" ","pages":"1-7"},"PeriodicalIF":1.0000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Pediatric Ophthalmology & Strabismus","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3928/01913913-20250404-03","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To evaluate the success of Chat Generative Pre-trained Transformer (ChatGPT; OpenAl), Google Gemini (Alphabet, Inc), and Microsoft Copilot (Microsoft Corporation) artificial intelligence (AI) programs, which are offered free of charge by three different manufacturers, in answering questions related to pediatric ophthalmology correctly and to investigate whether they are superior to each other.

Methods: ChatGPT, Gemini, and Copilot were each asked 100 multiple-choice questions from the Ophtho-Questions online question bank, which is widely used for preparing for the high-stakes Ophthalmic Knowledge Evaluation Program examination. Their answers were compared to the official answer keys and categorized as correct or incorrect. The readability of the responses was assessed using the Flesch-Kincaid Grade Level, Flesch Reading Ease Score, and the Coleman-Liau Index.

Results: ChatGPT, Gemini, and Copilot chatbots answered 61 (61%), 60 (60%), and 74 (74%) questions correctly, respectively. The Copilot AI program had a significantly higher rate of correct answers to questions than ChatGPT and Gemini (P = .049 and .035). Three readability analyses revealed that Copilot had the highest average score, followed by ChatGPT and Gemini, which were more challenging than the recommended level.

Conclusions: Although AI chatbots can serve as useful tools for acquiring information on pediatric ophthalmology, their responses should be interpreted with caution due to potential inaccuracies. [J Pediatr Ophthalmol Strabismus. 20XX;X(X):XXX-XXX.].

ChatGPT-4、Microsoft Copilot和谷歌Gemini在儿童眼科问题中的比较
目的:评价聊天生成预训练变压器(ChatGPT)的有效性;OpenAl)、谷歌Gemini (Alphabet公司)、Microsoft Copilot(微软公司)等三家不同厂商免费提供的人工智能(AI)程序,在正确回答儿童眼科相关问题,并调查它们之间是否存在优势。方法:ChatGPT、Gemini和Copilot各被要求回答100道选择题,这些选择题来自于广泛用于高风险眼科知识评估项目考试的opho - questions在线题库。他们的答案将与官方答案进行比较,并按正确或错误进行分类。使用Flesch- kincaid Grade Level、Flesch Reading Ease Score和Coleman-Liau Index来评估问卷的可读性。结果:ChatGPT、Gemini和Copilot聊天机器人分别正确回答了61个(61%)、60个(60%)和74个(74%)问题。与ChatGPT和Gemini相比,Copilot人工智能程序对问题的正确率明显更高(P = 0.049和0.035)。三项可读性分析显示,Copilot的平均得分最高,其次是ChatGPT和Gemini,它们比推荐的水平更具挑战性。结论:尽管人工智能聊天机器人可以作为获取儿童眼科信息的有用工具,但由于可能存在不准确性,应谨慎解读它们的回答。[J].儿童眼斜视,2009;X(X):XXX-XXX。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.80
自引率
8.30%
发文量
115
审稿时长
>12 weeks
期刊介绍: The Journal of Pediatric Ophthalmology & Strabismus is a bimonthly peer-reviewed publication for pediatric ophthalmologists. The Journal has published original articles on the diagnosis, treatment, and prevention of eye disorders in the pediatric age group and the treatment of strabismus in all age groups for over 50 years.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信