Enhancing diabetic retinopathy query responses: assessing large language model in ophthalmology

IF 3.7 2区 医学 Q1 OPHTHALMOLOGY
Hongkang Wu, Zichang Su, Xiangji Pan, An Shao, Yufeng Xu, Yao Wang, Kai Jin, Juan Ye
{"title":"Enhancing diabetic retinopathy query responses: assessing large language model in ophthalmology","authors":"Hongkang Wu, Zichang Su, Xiangji Pan, An Shao, Yufeng Xu, Yao Wang, Kai Jin, Juan Ye","doi":"10.1136/bjo-2024-325861","DOIUrl":null,"url":null,"abstract":"Background Diabetic retinopathy (DR) is a leading cause of blindness, with an increasing reliance on large language models (LLMs) for health-related information. The specificity of LLM-generated responses to DR queries is yet to be established, prompting an investigation into their suitability for ophthalmological contexts. Methods A cross-sectional study involving six LLMs was conducted to ascertain the accuracy and comprehensiveness of responses to 42 DR-related questions from 1 February 2024 to 31 March 2024. Three consultant-level ophthalmologists independently assessed the responses, grading them on accuracy and comprehensiveness. Additionally, the self-correction capability and readability of the responses were analysed statistically. Results An analysis of 252 responses from six LLMs showed an average word count ranging from 155.3 to 304.3 and an average character count ranging from 975.3 to 2043.5. The readability scores showed significant variability, with ChatGPT-3.5 displaying the lowest readability level. The accuracy of the responses was high, with ChatGPT-4.0 receiving 97.6% good ratings and no ‘poor’ grades for the top three models. After introducing a self-correction prompt, the average accuracy score demonstrated a significant improvement, increasing from 6.4 to 7.5. Conclusion LLMs have the potential to provide accurate and comprehensive responses to DR-related questions, making them advantageous for ophthalmology applications. However, before clinical integration, further refinement is needed to address readability, and continuous validation assessments are imperative to ensure reliability. Data are available upon reasonable request.","PeriodicalId":9313,"journal":{"name":"British Journal of Ophthalmology","volume":"25 1","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"British Journal of Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bjo-2024-325861","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background Diabetic retinopathy (DR) is a leading cause of blindness, with an increasing reliance on large language models (LLMs) for health-related information. The specificity of LLM-generated responses to DR queries is yet to be established, prompting an investigation into their suitability for ophthalmological contexts. Methods A cross-sectional study involving six LLMs was conducted to ascertain the accuracy and comprehensiveness of responses to 42 DR-related questions from 1 February 2024 to 31 March 2024. Three consultant-level ophthalmologists independently assessed the responses, grading them on accuracy and comprehensiveness. Additionally, the self-correction capability and readability of the responses were analysed statistically. Results An analysis of 252 responses from six LLMs showed an average word count ranging from 155.3 to 304.3 and an average character count ranging from 975.3 to 2043.5. The readability scores showed significant variability, with ChatGPT-3.5 displaying the lowest readability level. The accuracy of the responses was high, with ChatGPT-4.0 receiving 97.6% good ratings and no ‘poor’ grades for the top three models. After introducing a self-correction prompt, the average accuracy score demonstrated a significant improvement, increasing from 6.4 to 7.5. Conclusion LLMs have the potential to provide accurate and comprehensive responses to DR-related questions, making them advantageous for ophthalmology applications. However, before clinical integration, further refinement is needed to address readability, and continuous validation assessments are imperative to ensure reliability. Data are available upon reasonable request.
增强糖尿病视网膜病变查询回复:评估眼科大语言模型
背景:糖尿病视网膜病变(DR)是失明的主要原因,越来越依赖于大语言模型(llm)来获取与健康相关的信息。llm生成的DR查询响应的特异性尚未建立,促使调查他们是否适合眼科背景。方法对6名法学硕士进行横断面研究,以确定2024年2月1日至2024年3月31日期间42个dr相关问题的回答的准确性和全面性。三位顾问级别的眼科医生独立评估了这些回答,并对其准确性和全面性进行了评分。此外,统计分析了问卷的自校正能力和可读性。结果对6位法学硕士的252份回复进行分析,平均字数在155.3 ~ 304.3之间,平均字数在975.3 ~ 2043.5之间。可读性得分表现出显著的差异,ChatGPT-3.5显示出最低的可读性水平。回答的准确性很高,ChatGPT-4.0获得97.6%的好评,前三名模型没有“差”等级。在引入自我纠正提示后,平均准确率得分有了显著提高,从6.4分提高到7.5分。结论llm有可能对dr相关问题提供准确、全面的回答,有利于眼科应用。然而,在临床整合之前,需要进一步改进以解决可读性问题,并且持续的验证评估是确保可靠性的必要条件。如有合理要求,可提供资料。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.30
自引率
2.40%
发文量
213
审稿时长
3-6 weeks
期刊介绍: The British Journal of Ophthalmology (BJO) is an international peer-reviewed journal for ophthalmologists and visual science specialists. BJO publishes clinical investigations, clinical observations, and clinically relevant laboratory investigations related to ophthalmology. It also provides major reviews and also publishes manuscripts covering regional issues in a global context.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信