{"title":"Evaluating the novel role of ChatGPT-4 in addressing corneal ulcer queries: An AI-powered insight.","authors":"Bharat Gurnani, Kirandeep Kaur, Prasanth Gireesh, Logesh Balakrishnan, Chitaranjan Mishra","doi":"10.1177/11206721251337290","DOIUrl":null,"url":null,"abstract":"<p><p>PurposeChatGPT-4, a natural language processing-based AI model, is increasingly being applied in healthcare, facilitating education, research, and clinical decision-making support. This study explores ChatGPT-4's capability to deliver accurate and detailed information on corneal ulcers, assessing its application in medical education and clinical decision-making.MethodsThe study engaged ChatGPT-4 with 12 structured questions across different categories related to corneal ulcers. For each inquiry, five unique ChatGPT-4 sessions were initiated, ensuring that the output was not affected by previous queries. A panel of five ophthalmology experts including optometry teaching and research staff assessed the responses using a Likert scale (1-5) (1: very poor; 2: poor; 3: acceptable; 4: good; 5: very good) for quality and accuracy. Median scores were calculated, and inter-rater reliability was assessed to gauge consistency among evaluators.ResultsThe evaluation of ChatGPT-4's responses to corneal ulcer-related questions revealed varied performance across categories. Median scores were consistently high (4.0) for risk factors, etiology, symptoms, treatment, complications, and prognosis, with narrow IQRs (3.0-4.0), reflecting strong agreement. However, classification and investigations scored slightly lower (median 3.0). Signs of corneal ulcers had a median of 2.0, showing significant variability. Of 300 responses, 45% were rated 'good,' 41.7% 'acceptable,' 10% 'poor,' and only 3.3% 'very good,' highlighting areas for improvement. Notably, Evaluator 2 gave 35 'good' ratings, while Evaluators 1 and 3 assigned 10 'poor' ratings each. Inter-evaluator variability, along with gaps in diagnostic precision, underscores the need for refining AI responses. Continuous feedback and targeted adjustments could boost ChatGPT-4's utility in delivering high-quality ophthalmic education.ConclusionChatGPT-4 shows promising utility in providing educational content on corneal ulcers. Despite the variance in evaluator ratings, the numerical analysis suggests that with further refinement, ChatGPT-4 could be a valuable tool in ophthalmological education and clinical support.</p>","PeriodicalId":12000,"journal":{"name":"European Journal of Ophthalmology","volume":" ","pages":"1531-1541"},"PeriodicalIF":1.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/11206721251337290","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/28 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
PurposeChatGPT-4, a natural language processing-based AI model, is increasingly being applied in healthcare, facilitating education, research, and clinical decision-making support. This study explores ChatGPT-4's capability to deliver accurate and detailed information on corneal ulcers, assessing its application in medical education and clinical decision-making.MethodsThe study engaged ChatGPT-4 with 12 structured questions across different categories related to corneal ulcers. For each inquiry, five unique ChatGPT-4 sessions were initiated, ensuring that the output was not affected by previous queries. A panel of five ophthalmology experts including optometry teaching and research staff assessed the responses using a Likert scale (1-5) (1: very poor; 2: poor; 3: acceptable; 4: good; 5: very good) for quality and accuracy. Median scores were calculated, and inter-rater reliability was assessed to gauge consistency among evaluators.ResultsThe evaluation of ChatGPT-4's responses to corneal ulcer-related questions revealed varied performance across categories. Median scores were consistently high (4.0) for risk factors, etiology, symptoms, treatment, complications, and prognosis, with narrow IQRs (3.0-4.0), reflecting strong agreement. However, classification and investigations scored slightly lower (median 3.0). Signs of corneal ulcers had a median of 2.0, showing significant variability. Of 300 responses, 45% were rated 'good,' 41.7% 'acceptable,' 10% 'poor,' and only 3.3% 'very good,' highlighting areas for improvement. Notably, Evaluator 2 gave 35 'good' ratings, while Evaluators 1 and 3 assigned 10 'poor' ratings each. Inter-evaluator variability, along with gaps in diagnostic precision, underscores the need for refining AI responses. Continuous feedback and targeted adjustments could boost ChatGPT-4's utility in delivering high-quality ophthalmic education.ConclusionChatGPT-4 shows promising utility in providing educational content on corneal ulcers. Despite the variance in evaluator ratings, the numerical analysis suggests that with further refinement, ChatGPT-4 could be a valuable tool in ophthalmological education and clinical support.
期刊介绍:
The European Journal of Ophthalmology was founded in 1991 and is issued in print bi-monthly. It publishes only peer-reviewed original research reporting clinical observations and laboratory investigations with clinical relevance focusing on new diagnostic and surgical techniques, instrument and therapy updates, results of clinical trials and research findings.