{"title":"Artificial intelligence in pediatric ophthalmology: a comparative study of ChatGPT-4.0 and DeepSeek-R1 performance.","authors":"Gamze Karataş, Mehmet Egemen Karataş","doi":"10.1080/09273972.2025.2536782","DOIUrl":null,"url":null,"abstract":"<p><p><i>Objective</i>: This study aims to evaluate and compare the accuracy and performance of two large language models (LLMs), ChatGPT-4.0 and DeepSeek-R1, in answering pediatric ophthalmology-related questions. <i>Methods</i>: A total of 44 multiple-choice questions were selected, covering various subspecialties of pediatric ophthalmology. Both LLMs were tasked with answering these questions, and their responses were compared in terms of accuracy. <i>Results</i>: ChatGPT-4.0 correctly answered 82% of the questions, while DeepSeek-R1 achieved a higher accuracy rate of 93% (p: 0.06). In strabismus, ChatGPT-4.0 answered 70% of questions correctly, while DeepSeek-R1 achieved 82% (p: 0.50). In other subspecialties, ChatGPT-4.0 answered 89% correctly, and DeepSeek-R1 achieved 100% accuracy (p: 0.25). <i>Conclusion</i>: DeepSeek-R1 outperformed ChatGPT-4.0 in overall accuracy, particularly in pediatric ophthalmology. These findings suggest the need for further optimization of LLM models to enhance their performance and reliability in clinical settings, especially in pediatric ophthalmology.</p>","PeriodicalId":51700,"journal":{"name":"Strabismus","volume":" ","pages":"1-7"},"PeriodicalIF":0.8000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Strabismus","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09273972.2025.2536782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: This study aims to evaluate and compare the accuracy and performance of two large language models (LLMs), ChatGPT-4.0 and DeepSeek-R1, in answering pediatric ophthalmology-related questions. Methods: A total of 44 multiple-choice questions were selected, covering various subspecialties of pediatric ophthalmology. Both LLMs were tasked with answering these questions, and their responses were compared in terms of accuracy. Results: ChatGPT-4.0 correctly answered 82% of the questions, while DeepSeek-R1 achieved a higher accuracy rate of 93% (p: 0.06). In strabismus, ChatGPT-4.0 answered 70% of questions correctly, while DeepSeek-R1 achieved 82% (p: 0.50). In other subspecialties, ChatGPT-4.0 answered 89% correctly, and DeepSeek-R1 achieved 100% accuracy (p: 0.25). Conclusion: DeepSeek-R1 outperformed ChatGPT-4.0 in overall accuracy, particularly in pediatric ophthalmology. These findings suggest the need for further optimization of LLM models to enhance their performance and reliability in clinical settings, especially in pediatric ophthalmology.