ChatGPT for Addressing Patient-centered Frequently Asked Questions in Glaucoma Clinical Practice.

Q2 Medicine
Henrietta Wang, Katherine Masselos, Janelle Tong, Heather R M Connor, Janelle Scully, Sophia Zhang, Daniel Rafla, Matteo Posarelli, Jeremy C K Tan, Ashish Agar, Michael Kalloniatis, Jack Phu
{"title":"ChatGPT for Addressing Patient-centered Frequently Asked Questions in Glaucoma Clinical Practice.","authors":"Henrietta Wang, Katherine Masselos, Janelle Tong, Heather R M Connor, Janelle Scully, Sophia Zhang, Daniel Rafla, Matteo Posarelli, Jeremy C K Tan, Ashish Agar, Michael Kalloniatis, Jack Phu","doi":"10.1016/j.ogla.2024.10.005","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Large language models such as ChatGPT-3.5 are often used by the public to answer questions related to daily life, including health advice. This study evaluated the responses of ChatGPT-3.5 in answering patient-centered frequently asked questions (FAQs) relevant in glaucoma clinical practice.</p><p><strong>Design: </strong>Prospective cross-sectional survey.</p><p><strong>Participants: </strong>Expert graders.</p><p><strong>Methods: </strong>Twelve experts across a range of clinical, education, and research practices in optometry and ophthalmology. Over 200 patient-centric FAQs from authoritative professional society, hospital and advocacy websites were distilled and filtered into 40 questions across 4 themes: definition and risk factors, diagnosis and testing, lifestyle and other accompanying conditions, and treatment and follow-up. The questions were individually input into ChatGPT-3.5 to generate responses. The responses were graded by the 12 experts individually.</p><p><strong>Main outcome measures: </strong>A 5-point Likert scale (1 = strongly disagree; 5 = strongly agree) was used to grade ChatGPT-3.5 responses across 4 domains: coherency, factuality, comprehensiveness, and safety.</p><p><strong>Results: </strong>Across all themes and domains, median scores were all 4 (\"agree\"). Comprehensiveness had the lowest scores across domains (mean 3.7 ± 0.9), followed by factuality (mean 3.9 ± 0.9) and coherency and safety (mean 4.1 ± 0.8 for both). Examination of the individual 40 questions showed that 8 (20%), 17 (42.5%), 24 (60%), and 8 (20%) of the questions had average scores below 4 (i.e., below \"agree\") for the coherency, factuality, comprehensiveness, and safety domains, respectively. Free-text comments by the experts highlighted omissions of facts and comprehensiveness (e.g., secondary glaucoma) and remarked on the vagueness of some responses (i.e., that the response did not account for individual patient circumstances).</p><p><strong>Conclusions: </strong>ChatGPT-3.5 responses to FAQs in glaucoma were generally agreeable in terms of coherency, factuality, comprehensiveness, and safety. However, areas of weakness were identified, precluding recommendations for routine use to provide patients with tailored counseling in glaucoma, especially with respect to development of glaucoma and its management.</p><p><strong>Financial disclosure(s): </strong>Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.</p>","PeriodicalId":56368,"journal":{"name":"Ophthalmology. Glaucoma","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ophthalmology. Glaucoma","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.ogla.2024.10.005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: Large language models such as ChatGPT-3.5 are often used by the public to answer questions related to daily life, including health advice. This study evaluated the responses of ChatGPT-3.5 in answering patient-centered frequently asked questions (FAQs) relevant in glaucoma clinical practice.

Design: Prospective cross-sectional survey.

Participants: Expert graders.

Methods: Twelve experts across a range of clinical, education, and research practices in optometry and ophthalmology. Over 200 patient-centric FAQs from authoritative professional society, hospital and advocacy websites were distilled and filtered into 40 questions across 4 themes: definition and risk factors, diagnosis and testing, lifestyle and other accompanying conditions, and treatment and follow-up. The questions were individually input into ChatGPT-3.5 to generate responses. The responses were graded by the 12 experts individually.

Main outcome measures: A 5-point Likert scale (1 = strongly disagree; 5 = strongly agree) was used to grade ChatGPT-3.5 responses across 4 domains: coherency, factuality, comprehensiveness, and safety.

Results: Across all themes and domains, median scores were all 4 ("agree"). Comprehensiveness had the lowest scores across domains (mean 3.7 ± 0.9), followed by factuality (mean 3.9 ± 0.9) and coherency and safety (mean 4.1 ± 0.8 for both). Examination of the individual 40 questions showed that 8 (20%), 17 (42.5%), 24 (60%), and 8 (20%) of the questions had average scores below 4 (i.e., below "agree") for the coherency, factuality, comprehensiveness, and safety domains, respectively. Free-text comments by the experts highlighted omissions of facts and comprehensiveness (e.g., secondary glaucoma) and remarked on the vagueness of some responses (i.e., that the response did not account for individual patient circumstances).

Conclusions: ChatGPT-3.5 responses to FAQs in glaucoma were generally agreeable in terms of coherency, factuality, comprehensiveness, and safety. However, areas of weakness were identified, precluding recommendations for routine use to provide patients with tailored counseling in glaucoma, especially with respect to development of glaucoma and its management.

Financial disclosure(s): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

用于解决青光眼临床实践中以患者为中心的常见问题的 ChatGPT。
目的:公众经常使用 ChatGPT-3.5 等大型语言模型来回答与日常生活相关的问题,包括健康建议。本研究评估了 ChatGPT-3.5 在回答与青光眼临床实践相关的以患者为中心的常见问题(FAQs)时的反应:设计:前瞻性横断面调查:方法:12 位专家参与了视光和眼科的临床、教育和研究实践。从权威的专业学会、医院和宣传网站上收集了 200 多个以患者为中心的常见问题,并将其提炼和筛选为 40 个问题,涵盖四个主题:定义和风险因素、诊断和检测、生活方式和其他伴随症状,以及治疗和随访。这些问题被逐一输入 ChatGPT-3.5 生成回复。回答由十二位专家分别评分:采用 5 点李克特量表(1 = 非常不同意;5 = 非常同意)对 ChatGPT-3.5 的回答进行四个方面的评分:连贯性、事实性、全面性和安全性:在所有主题和领域中,得分中位数均为 4("同意")。在所有领域中,全面性得分最低(平均值为 3.7±0.9),其次是事实性(平均值为 3.9±0.9)、连贯性和安全性(两者的平均值均为 4.1±0.8)。对 40 个问题的研究表明,8 个问题(20%)、17 个问题(42.5%)、24 个问题(60%)和 8 个问题(20%)在一致性、事实性、全面性和安全性方面的平均得分分别低于 4 分(即低于 "同意 "分)。专家的自由文本评论强调了事实和全面性方面的遗漏(如继发性青光眼),并对一些回答的模糊性进行了评论(即回答没有考虑到患者的具体情况):结论:ChatGPT-3.5 对青光眼常见问题的回答在连贯性、事实性、全面性和安全性方面基本一致。但是,也发现了一些不足之处,因此不建议常规使用,以便为青光眼患者提供量身定制的咨询服务,特别是在青光眼的发展及其管理方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ophthalmology. Glaucoma
Ophthalmology. Glaucoma Medicine-Medicine (all)
CiteScore
4.20
自引率
0.00%
发文量
140
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信