Hongkang Wu, Zichang Su, Xiangji Pan, An Shao, Yufeng Xu, Yao Wang, Kai Jin, Juan Ye
{"title":"Enhancing diabetic retinopathy query responses: assessing large language model in ophthalmology","authors":"Hongkang Wu, Zichang Su, Xiangji Pan, An Shao, Yufeng Xu, Yao Wang, Kai Jin, Juan Ye","doi":"10.1136/bjo-2024-325861","DOIUrl":null,"url":null,"abstract":"Background Diabetic retinopathy (DR) is a leading cause of blindness, with an increasing reliance on large language models (LLMs) for health-related information. The specificity of LLM-generated responses to DR queries is yet to be established, prompting an investigation into their suitability for ophthalmological contexts. Methods A cross-sectional study involving six LLMs was conducted to ascertain the accuracy and comprehensiveness of responses to 42 DR-related questions from 1 February 2024 to 31 March 2024. Three consultant-level ophthalmologists independently assessed the responses, grading them on accuracy and comprehensiveness. Additionally, the self-correction capability and readability of the responses were analysed statistically. Results An analysis of 252 responses from six LLMs showed an average word count ranging from 155.3 to 304.3 and an average character count ranging from 975.3 to 2043.5. The readability scores showed significant variability, with ChatGPT-3.5 displaying the lowest readability level. The accuracy of the responses was high, with ChatGPT-4.0 receiving 97.6% good ratings and no ‘poor’ grades for the top three models. After introducing a self-correction prompt, the average accuracy score demonstrated a significant improvement, increasing from 6.4 to 7.5. Conclusion LLMs have the potential to provide accurate and comprehensive responses to DR-related questions, making them advantageous for ophthalmology applications. However, before clinical integration, further refinement is needed to address readability, and continuous validation assessments are imperative to ensure reliability. Data are available upon reasonable request.","PeriodicalId":9313,"journal":{"name":"British Journal of Ophthalmology","volume":"25 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bjo-2024-325861","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background Diabetic retinopathy (DR) is a leading cause of blindness, with an increasing reliance on large language models (LLMs) for health-related information. The specificity of LLM-generated responses to DR queries is yet to be established, prompting an investigation into their suitability for ophthalmological contexts. Methods A cross-sectional study involving six LLMs was conducted to ascertain the accuracy and comprehensiveness of responses to 42 DR-related questions from 1 February 2024 to 31 March 2024. Three consultant-level ophthalmologists independently assessed the responses, grading them on accuracy and comprehensiveness. Additionally, the self-correction capability and readability of the responses were analysed statistically. Results An analysis of 252 responses from six LLMs showed an average word count ranging from 155.3 to 304.3 and an average character count ranging from 975.3 to 2043.5. The readability scores showed significant variability, with ChatGPT-3.5 displaying the lowest readability level. The accuracy of the responses was high, with ChatGPT-4.0 receiving 97.6% good ratings and no ‘poor’ grades for the top three models. After introducing a self-correction prompt, the average accuracy score demonstrated a significant improvement, increasing from 6.4 to 7.5. Conclusion LLMs have the potential to provide accurate and comprehensive responses to DR-related questions, making them advantageous for ophthalmology applications. However, before clinical integration, further refinement is needed to address readability, and continuous validation assessments are imperative to ensure reliability. Data are available upon reasonable request.
期刊介绍:
The British Journal of Ophthalmology (BJO) is an international peer-reviewed journal for ophthalmologists and visual science specialists. BJO publishes clinical investigations, clinical observations, and clinically relevant laboratory investigations related to ophthalmology. It also provides major reviews and also publishes manuscripts covering regional issues in a global context.