Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model

Mayo Clinic Proceedings. Digital health Pub Date : 2024-02-15 DOI:10.1016/j.mcpdig.2024.01.003

Prashant D. Tailor MD , Timothy T. Xu MD , Blake H. Fortes MD , Raymond Iezzi MD , Timothy W. Olsen MD , Matthew R. Starr MD , Sophie J. Bakri MD , Brittni A. Scruggs MD, PhD , Andrew J. Barkmeier MD , Sanjay V. Patel MD , Keith H. Baratz MD , Ashlie A. Bernhisel MD , Lilly H. Wagner MD , Andrea A. Tooley MD , Gavin W. Roddy MD, PhD , Arthur J. Sit MD , Kristi Y. Wu MD , Erick D. Bothun MD , Sasha A. Mansukhani MBBS , Brian G. Mohney MD , Lauren A. Dalvin MD

{"title":"Appropriateness of Ophthalmology Recommendations From an Online Chat-Based Artificial Intelligence Model","authors":"Prashant D. Tailor MD , Timothy T. Xu MD , Blake H. Fortes MD , Raymond Iezzi MD , Timothy W. Olsen MD , Matthew R. Starr MD , Sophie J. Bakri MD , Brittni A. Scruggs MD, PhD , Andrew J. Barkmeier MD , Sanjay V. Patel MD , Keith H. Baratz MD , Ashlie A. Bernhisel MD , Lilly H. Wagner MD , Andrea A. Tooley MD , Gavin W. Roddy MD, PhD , Arthur J. Sit MD , Kristi Y. Wu MD , Erick D. Bothun MD , Sasha A. Mansukhani MBBS , Brian G. Mohney MD , Lauren A. Dalvin MD","doi":"10.1016/j.mcpdig.2024.01.003","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions.</p></div><div><h3>Patients and Methods</h3><p>Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty.</p></div><div><h3>Results</h3><p>For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts.</p></div><div><h3>Conclusion</h3><p>This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.</p></div>","PeriodicalId":74127,"journal":{"name":"Mayo Clinic Proceedings. Digital health","volume":"2 1","pages":"Pages 119-128"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294976122400004X/pdfft?md5=5523855f19c376cfc730f0de31cbe918&pid=1-s2.0-S294976122400004X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mayo Clinic Proceedings. Digital health","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S294976122400004X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objective

To determine the appropriateness of ophthalmology recommendations from an online chat-based artificial intelligence model to ophthalmology questions.

Patients and Methods

Cross-sectional qualitative study from April 1, 2023, to April 30, 2023. A total of 192 questions were generated spanning all ophthalmic subspecialties. Each question was posed to a large language model (LLM) 3 times. The responses were graded by appropriate subspecialists as appropriate, inappropriate, or unreliable in 2 grading contexts. The first grading context was if the information was presented on a patient information site. The second was an LLM-generated draft response to patient queries sent by the electronic medical record (EMR). Appropriate was defined as accurate and specific enough to serve as a surrogate for physician-approved information. Main outcome measure was percentage of appropriate responses per subspecialty.

Results

For patient information site-related questions, the LLM provided an overall average of 79% appropriate responses. Variable rates of average appropriateness were observed across ophthalmic subspecialties for patient information site information ranging from 56% to 100%: cataract or refractive (92%), cornea (56%), glaucoma (72%), neuro-ophthalmology (67%), oculoplastic or orbital surgery (80%), ocular oncology (100%), pediatrics (89%), vitreoretinal diseases (86%), and uveitis (65%). For draft responses to patient questions via EMR, the LLM provided an overall average of 74% appropriate responses and varied by subspecialty: cataract or refractive (85%), cornea (54%), glaucoma (77%), neuro-ophthalmology (63%), oculoplastic or orbital surgery (62%), ocular oncology (90%), pediatrics (94%), vitreoretinal diseases (88%), and uveitis (55%). Stratifying grades across health information categories (disease and condition, risk and prevention, surgery-related, and treatment and management) showed notable but insignificant variations, with disease and condition often rated highest (72% and 69%) for appropriateness and surgery-related (55% and 51%) lowest, in both contexts.

Conclusion

This LLM reported mostly appropriate responses across multiple ophthalmology subspecialties in the context of both patient information sites and EMR-related responses to patient questions. Current LLM offerings require optimization and improvement before widespread clinical use.

查看原文本刊更多论文

基于在线聊天的人工智能模型提供的眼科建议的适宜性

患者和方法2023 年 4 月 1 日至 2023 年 4 月 30 日进行的横断面定性研究。共生成 192 个问题，涵盖所有眼科亚专科。每个问题都向大语言模型（LLM）提出 3 次。相应的亚专科医生在两个分级情境中将回答分为适当、不适当或不可靠。第一种分级情境是信息是否出现在患者信息网站上。第二种是由 LLM 生成的对电子病历 (EMR) 发送的患者询问的回复草稿。适当被定义为足够准确和具体，可以作为医生批准信息的替代。主要结果指标为每个亚专科的适当回复百分比。结果对于与患者信息网站相关的问题，LLM 提供的总体平均适当回复率为 79%。各眼科亚专科对患者信息网站信息的平均合适率从 56% 到 100% 不等：白内障或屈光（92%）、角膜（56%）、青光眼（72%）、神经眼科（67%）、眼部整形或眼眶手术（80%）、眼部肿瘤（100%）、儿科（89%）、玻璃体视网膜疾病（86%）和葡萄膜炎（65%）。对于通过电子病历回答患者问题的草稿，法律硕士平均提供了 74% 的适当答复，并因亚专科而异：白内障或屈光（85%）、角膜（54%）、青光眼（77%）、神经眼科（63%）、眼部整形或眼眶手术（62%）、眼部肿瘤（90%）、儿科（94%）、玻璃体视网膜疾病（88%）和葡萄膜炎（55%）。对健康信息类别（疾病和病情、风险和预防、手术相关以及治疗和管理）的分层评分显示出显著但不明显的差异，在这两种情况下，疾病和病情的适当性往往被评为最高（72%和69%），而手术相关的适当性则被评为最低（55%和51%）。在临床广泛使用之前，目前的 LLM 产品需要优化和改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Mayo Clinic Proceedings. Digital health Medicine and Dentistry (General), Health Informatics, Public Health and Health Policy

自引率

0.00%

发文量

审稿时长

47 days