A comparative analysis of GPT-3.5 and GPT-4.0 on a multiple-choice ophthalmology question bank: A study on artificial intelligence developments.

Romanian journal of ophthalmology Pub Date : 2024-10-01 DOI:10.22336/rjo.2024.67

Suleyman Demir

{"title":"A comparative analysis of GPT-3.5 and GPT-4.0 on a multiple-choice ophthalmology question bank: A study on artificial intelligence developments.","authors":"Suleyman Demir","doi":"10.22336/rjo.2024.67","DOIUrl":null,"url":null,"abstract":"Introduction: To evaluate the performance of ChatGPT-4.0 and ChatGPT-3.5 in answering multiple-choice questions in OphthoQuestions (www.ophthoquestions.com), a popular question preparation bank, and to compare the performance of GPT-4.0 and GPT-3.5.Methods: In January 2024, using a personal account on OphthoQuestions (www.ophthoquestions.com), 520 questions were selected from 4,551 OphthoQuestions. These 520 questions were created by randomly selecting 40 questions from each of 13 ophthalmology subspecialties. GPT-3.5 and GPT-4.0 were asked to answer these same 520 questions.Results: ChatGPT-4.0 and ChatGPT-3.5 answered 408 questions (78.46%) 95% CI [70,88%] and 333 questions (64.15%) 95% CI [53,74%] of 520 questions correctly, respectively. GPT-4.0 answered significantly more questions correctly than GPT-3.5 (p= 0.0195). ChatGPT-4.0 showed a statistically significant difference compared to ChatGPT-3.5 in giving correct answers in all subgroup analyses (p<0.05).Discussions: This study gives an encouraging new proof of ChatGPT's ability to manage complex clinical and medical data, focusing on the development and consistency of artificial intelligence algorithms. The statistically significant success of GPT-4.0 over GPT-3.5 in this study should be examined in light of future algorithm advances, particularly in online tests, which will increase progressively as the use of artificial intelligence poses an increasing danger to test integrity. Protocols such as required proctoring should be considered. In the following years, ChatGPT's clinical management and decision-making expertise should be supplemented by more research indicating that it may be a beneficial resource for ophthalmologists and other medical professionals seeking information and guidance on challenging cases.Conclusions: GPT-4.0 was found to give more and more consistent answers than GPT 3.5 on a multiple-choice ophthalmology question bank. ChatGPT has shown significant differences between algorithms in accuracy and repeatability when handling questions related to eye diseases. This study shows that new artificial intelligence algorithms are promising. More data is needed to use artificial intelligence language models in medical applications.","PeriodicalId":94355,"journal":{"name":"Romanian journal of ophthalmology","volume":"68 4","pages":"367-371"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11809821/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Romanian journal of ophthalmology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22336/rjo.2024.67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: To evaluate the performance of ChatGPT-4.0 and ChatGPT-3.5 in answering multiple-choice questions in OphthoQuestions (www.ophthoquestions.com), a popular question preparation bank, and to compare the performance of GPT-4.0 and GPT-3.5.

Methods: In January 2024, using a personal account on OphthoQuestions (www.ophthoquestions.com), 520 questions were selected from 4,551 OphthoQuestions. These 520 questions were created by randomly selecting 40 questions from each of 13 ophthalmology subspecialties. GPT-3.5 and GPT-4.0 were asked to answer these same 520 questions.

Results: ChatGPT-4.0 and ChatGPT-3.5 answered 408 questions (78.46%) 95% CI [70,88%] and 333 questions (64.15%) 95% CI [53,74%] of 520 questions correctly, respectively. GPT-4.0 answered significantly more questions correctly than GPT-3.5 (p= 0.0195). ChatGPT-4.0 showed a statistically significant difference compared to ChatGPT-3.5 in giving correct answers in all subgroup analyses (p<0.05).

Discussions: This study gives an encouraging new proof of ChatGPT's ability to manage complex clinical and medical data, focusing on the development and consistency of artificial intelligence algorithms. The statistically significant success of GPT-4.0 over GPT-3.5 in this study should be examined in light of future algorithm advances, particularly in online tests, which will increase progressively as the use of artificial intelligence poses an increasing danger to test integrity. Protocols such as required proctoring should be considered. In the following years, ChatGPT's clinical management and decision-making expertise should be supplemented by more research indicating that it may be a beneficial resource for ophthalmologists and other medical professionals seeking information and guidance on challenging cases.

Conclusions: GPT-4.0 was found to give more and more consistent answers than GPT 3.5 on a multiple-choice ophthalmology question bank. ChatGPT has shown significant differences between algorithms in accuracy and repeatability when handling questions related to eye diseases. This study shows that new artificial intelligence algorithms are promising. More data is needed to use artificial intelligence language models in medical applications.

查看原文本刊更多论文

GPT-3.5与GPT-4.0在眼科多项选择题库中的比较分析：人工智能发展的研究

前言：评估ChatGPT-4.0和ChatGPT-3.5在phthoquestions （www.ophthoquestions.com）中回答多项选择题的性能，并比较GPT-4.0和GPT-3.5的性能。方法：于2024年1月，使用phthoquestions （www.ophthoquestions.com）个人账号，从4551个phthoquestions中抽取520个问题。这520个问题是通过从13个眼科亚专科中随机选择40个问题而产生的。GPT-3.5和GPT-4.0被要求回答同样的520个问题。结果：在520个问题中，ChatGPT-4.0和ChatGPT-3.5分别正确回答了408个问题（78.46%）95% CI[70、88%]和333个问题（64.15%）95% CI[53、74%]。GPT-4.0的正确率显著高于GPT-3.5 （p= 0.0195）。在所有亚组分析中，ChatGPT-4.0与ChatGPT-3.5相比，在给出正确答案方面存在统计学上的显著差异(p讨论：本研究为ChatGPT管理复杂临床和医疗数据的能力提供了令人鼓舞的新证明，重点是人工智能算法的开发和一致性。在这项研究中，GPT-4.0比GPT-3.5在统计上的显著成功应该根据未来算法的进步来检验，特别是在在线测试中，随着人工智能的使用对测试完整性构成越来越大的危险，在线测试将逐步增加。应考虑必要的监督等协议。在接下来的几年里，ChatGPT的临床管理和决策专业知识应该通过更多的研究来补充，表明它可能是眼科医生和其他医疗专业人员在挑战性病例中寻求信息和指导的有益资源。结论：在眼科多项选择题库中，GPT-4.0比GPT- 3.5给出的答案更加一致。ChatGPT在处理与眼病相关的问题时，在准确性和可重复性方面显示出算法之间的显着差异。这项研究表明，新的人工智能算法是有前途的。在医学应用中使用人工智能语言模型需要更多的数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Romanian journal of ophthalmology

自引率

0.00%

发文量