{"title":"\"Comparative analysis of large language models against the NHS 111 online triaging for emergency ophthalmology\".","authors":"Shaheryar Ahmed Khan, Chrishan Gunasekera","doi":"10.1038/s41433-025-03605-8","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>This study presents a comprehensive evaluation of the performance of various large language models in generating responses for ophthalmology emergencies and compares their accuracy with the established United Kingdom's National Health Service 111 online system.</p><p><strong>Methods: </strong>We included 21 ophthalmology-related emergency scenario questions from the NHS 111 triaging algorithm. These questions were based on four different ophthalmology emergency themes as laid out in the NHS 111 algorithm. Responses generated from NHS 111 online, were compared to different LLM-chatbots responses to determine the accuracy of LLM responses. We included a range of models including ChatGPT-3.5, Google Bard, Bing Chat, and ChatGPT-4.0. The accuracy of each LLM-chatbot response was compared against the NHS 111 Triage using a two-prompt strategy. Answers were graded as following: -2 graded as \"Very poor\", -1 as \"Poor\", O as \"No response\", 1 as \"Good\", 2 as \"Very good\" and 3 graded as \"Excellent\".</p><p><strong>Results: </strong>Overall LLMs' attained a good accuracy in this study compared against the NHS 111 responses. The score of ≥1 graded as \"Good\" was achieved by 93% responses of all LLMs. This refers to at least part of this answer having correct information as well as absence of any wrong information. There was no marked difference and very similar results seen overall on both prompts.</p><p><strong>Conclusions: </strong>The high accuracy and safety observed in LLM responses support their potential as effective tools for providing timely information and guidance to patients. LLMs hold promise in enhancing patient care and healthcare accessibility in digital age.</p>","PeriodicalId":12125,"journal":{"name":"Eye","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Eye","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41433-025-03605-8","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: This study presents a comprehensive evaluation of the performance of various large language models in generating responses for ophthalmology emergencies and compares their accuracy with the established United Kingdom's National Health Service 111 online system.
Methods: We included 21 ophthalmology-related emergency scenario questions from the NHS 111 triaging algorithm. These questions were based on four different ophthalmology emergency themes as laid out in the NHS 111 algorithm. Responses generated from NHS 111 online, were compared to different LLM-chatbots responses to determine the accuracy of LLM responses. We included a range of models including ChatGPT-3.5, Google Bard, Bing Chat, and ChatGPT-4.0. The accuracy of each LLM-chatbot response was compared against the NHS 111 Triage using a two-prompt strategy. Answers were graded as following: -2 graded as "Very poor", -1 as "Poor", O as "No response", 1 as "Good", 2 as "Very good" and 3 graded as "Excellent".
Results: Overall LLMs' attained a good accuracy in this study compared against the NHS 111 responses. The score of ≥1 graded as "Good" was achieved by 93% responses of all LLMs. This refers to at least part of this answer having correct information as well as absence of any wrong information. There was no marked difference and very similar results seen overall on both prompts.
Conclusions: The high accuracy and safety observed in LLM responses support their potential as effective tools for providing timely information and guidance to patients. LLMs hold promise in enhancing patient care and healthcare accessibility in digital age.
期刊介绍:
Eye seeks to provide the international practising ophthalmologist with high quality articles, of academic rigour, on the latest global clinical and laboratory based research. Its core aim is to advance the science and practice of ophthalmology with the latest clinical- and scientific-based research. Whilst principally aimed at the practising clinician, the journal contains material of interest to a wider readership including optometrists, orthoptists, other health care professionals and research workers in all aspects of the field of visual science worldwide. Eye is the official journal of The Royal College of Ophthalmologists.
Eye encourages the submission of original articles covering all aspects of ophthalmology including: external eye disease; oculo-plastic surgery; orbital and lacrimal disease; ocular surface and corneal disorders; paediatric ophthalmology and strabismus; glaucoma; medical and surgical retina; neuro-ophthalmology; cataract and refractive surgery; ocular oncology; ophthalmic pathology; ophthalmic genetics.