{"title":"Comparative analysis of large language models in providing patient information about keratoconus and contact lenses.","authors":"Yavuz Kemal Aribas, Atike Burcin Tefon Aribas","doi":"10.1007/s10792-025-03711-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To evaluate the accuracy, completeness, informational quality, and readability of responses generated by large language models (LLMs)-ChatGPT (OpenAI, USA), Gemini (Google, USA), and Copilot (Microsoft, USA)-to patient questions concerning keratoconus and contact lens use.</p><p><strong>Methods: </strong>In this cross-sectional study, 32 questions across eight domains were posed to the free versions of each model. Two independent ophthalmologists rated accuracy (6-point Likert scale) and completeness (3-point Likert scale). Information quality was assessed using the DISCERN instrument, and readability was evaluated with the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Inter-rater agreement was measured with Cohen's Kappa.</p><p><strong>Results: </strong>Inter-rater reliability showed at least fair agreement for all LLMs. (min κ = 0.365) ChatGPT achieved significantly higher accuracy than Gemini (p < 0.001) and Copilot (p = 0.010), and higher completeness than Gemini (p = 0.001) but was similar to Copilot (p = 0.101). DISCERN scores were highest for ChatGPT (64), followed by Copilot (61) and Gemini (55). All models produced difficult-to-read content (FRES: Gemini 49.7, Copilot 45.4, ChatGPT 40.7), with FKGL values at late high school level.</p><p><strong>Conclusion: </strong>All evaluated large language models were capable of providing generally accurate and thorough information regarding keratoconus and contact lens use. Nevertheless, limitations in readability across models highlight the importance of clinician oversight to ensure that patient education remains clear, accessible, and appropriately tailored to individual needs.</p>","PeriodicalId":14473,"journal":{"name":"International Ophthalmology","volume":"45 1","pages":"340"},"PeriodicalIF":1.4000,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10792-025-03711-2","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: To evaluate the accuracy, completeness, informational quality, and readability of responses generated by large language models (LLMs)-ChatGPT (OpenAI, USA), Gemini (Google, USA), and Copilot (Microsoft, USA)-to patient questions concerning keratoconus and contact lens use.
Methods: In this cross-sectional study, 32 questions across eight domains were posed to the free versions of each model. Two independent ophthalmologists rated accuracy (6-point Likert scale) and completeness (3-point Likert scale). Information quality was assessed using the DISCERN instrument, and readability was evaluated with the Flesch Reading Ease Score (FRES) and Flesch-Kincaid Grade Level (FKGL). Inter-rater agreement was measured with Cohen's Kappa.
Results: Inter-rater reliability showed at least fair agreement for all LLMs. (min κ = 0.365) ChatGPT achieved significantly higher accuracy than Gemini (p < 0.001) and Copilot (p = 0.010), and higher completeness than Gemini (p = 0.001) but was similar to Copilot (p = 0.101). DISCERN scores were highest for ChatGPT (64), followed by Copilot (61) and Gemini (55). All models produced difficult-to-read content (FRES: Gemini 49.7, Copilot 45.4, ChatGPT 40.7), with FKGL values at late high school level.
Conclusion: All evaluated large language models were capable of providing generally accurate and thorough information regarding keratoconus and contact lens use. Nevertheless, limitations in readability across models highlight the importance of clinician oversight to ensure that patient education remains clear, accessible, and appropriately tailored to individual needs.
期刊介绍:
International Ophthalmology provides the clinician with articles on all the relevant subspecialties of ophthalmology, with a broad international scope. The emphasis is on presentation of the latest clinical research in the field. In addition, the journal includes regular sections devoted to new developments in technologies, products, and techniques.