Accuracy assessment of ChatGPT responses to frequently asked questions regarding anterior cruciate ligament surgery

IF 1.6 4区医学 Q3 ORTHOPEDICS

Knee Pub Date : 2024-09-05 DOI:10.1016/j.knee.2024.08.014

Juan Bernardo Villarreal-Espinosa , Rodrigo Saad Berreta , Felicitas Allende , José Rafael Garcia , Salvador Ayala , Filippo Familiari , Jorge Chahla

{"title":"Accuracy assessment of ChatGPT responses to frequently asked questions regarding anterior cruciate ligament surgery","authors":"Juan Bernardo Villarreal-Espinosa , Rodrigo Saad Berreta , Felicitas Allende , José Rafael Garcia , Salvador Ayala , Filippo Familiari , Jorge Chahla","doi":"10.1016/j.knee.2024.08.014","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>The emergence of artificial intelligence (AI) has allowed users to have access to large sources of information in a chat-like manner. Thereby, we sought to evaluate ChatGPT-4 response’s accuracy to the 10 patient most frequently asked questions (FAQs) regarding anterior cruciate ligament (ACL) surgery.</p></div><div><h3>Methods</h3><p>A list of the top 10 FAQs pertaining to ACL surgery was created after conducting a search through all Sports Medicine Fellowship Institutions listed on the Arthroscopy Association of North America (AANA) and American Orthopaedic Society of Sports Medicine (AOSSM) websites. A Likert scale was used to grade response accuracy by two sports medicine fellowship-trained surgeons. Cohen’s kappa was used to assess inter-rater agreement. Reproducibility of the responses over time was also assessed.</p></div><div><h3>Results</h3><p>Five of the 10 responses received a ‘completely accurate’ grade by two-fellowship trained surgeons with three additional replies receiving a ‘completely accurate’ status by at least one. Moreover, inter-rater reliability accuracy assessment revealed a moderate agreement between fellowship-trained attending physicians (weighted kappa = 0.57, 95% confidence interval 0.15–0.99). Additionally, 80% of the responses were reproducible over time.</p></div><div><h3>Conclusion</h3><p>ChatGPT can be considered an accurate additional tool to answer general patient questions regarding ACL surgery. None the less, patient–surgeon interaction should not be deferred and must continue to be the driving force for information retrieval. Thus, the general recommendation is to address any questions in the presence of a qualified specialist.</p></div>","PeriodicalId":56110,"journal":{"name":"Knee","volume":"51 ","pages":"Pages 84-92"},"PeriodicalIF":1.6000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knee","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968016024001480","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background

The emergence of artificial intelligence (AI) has allowed users to have access to large sources of information in a chat-like manner. Thereby, we sought to evaluate ChatGPT-4 response’s accuracy to the 10 patient most frequently asked questions (FAQs) regarding anterior cruciate ligament (ACL) surgery.

Methods

A list of the top 10 FAQs pertaining to ACL surgery was created after conducting a search through all Sports Medicine Fellowship Institutions listed on the Arthroscopy Association of North America (AANA) and American Orthopaedic Society of Sports Medicine (AOSSM) websites. A Likert scale was used to grade response accuracy by two sports medicine fellowship-trained surgeons. Cohen’s kappa was used to assess inter-rater agreement. Reproducibility of the responses over time was also assessed.

Results

Five of the 10 responses received a ‘completely accurate’ grade by two-fellowship trained surgeons with three additional replies receiving a ‘completely accurate’ status by at least one. Moreover, inter-rater reliability accuracy assessment revealed a moderate agreement between fellowship-trained attending physicians (weighted kappa = 0.57, 95% confidence interval 0.15–0.99). Additionally, 80% of the responses were reproducible over time.

Conclusion

ChatGPT can be considered an accurate additional tool to answer general patient questions regarding ACL surgery. None the less, patient–surgeon interaction should not be deferred and must continue to be the driving force for information retrieval. Thus, the general recommendation is to address any questions in the presence of a qualified specialist.

查看原文本刊更多论文

对有关前十字韧带手术常见问题的 ChatGPT 回答进行准确性评估。

背景：人工智能（AI）的出现使用户能够以类似聊天的方式获取大量信息。因此，我们试图评估 ChatGPT-4 对有关前交叉韧带（ACL）手术的 10 个患者最常见问题（FAQ）的回复准确性：在对北美关节镜协会（AANA）和美国运动医学矫形学会（AOSSM）网站上列出的所有运动医学研究机构进行搜索后，创建了一份有关前交叉韧带手术的 10 大常见问题清单。两名受过运动医学研究员培训的外科医生采用李克特量表对回答的准确性进行评分。Cohen's kappa 用于评估评分者之间的一致性。此外，还评估了回复在一段时间内的可重复性：结果：在 10 份答复中，有 5 份被两名受过研究培训的外科医生评为 "完全正确"，另有 3 份答复被至少一名外科医生评为 "完全正确"。此外，评分者之间的可靠性准确性评估显示，接受过研究员培训的主治医生之间的评分结果基本一致（加权卡帕 = 0.57，95% 置信区间为 0.15-0.99）。此外，80%的回答在一段时间内具有可重复性：结论：ChatGPT 可被视为回答患者有关前交叉韧带手术一般问题的准确补充工具。尽管如此，患者与医生之间的互动不应被推迟，必须继续成为信息检索的驱动力。因此，一般建议在合格的专科医生在场的情况下解决任何问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knee 医学-外科

CiteScore

3.80

自引率

5.30%

发文量

171

审稿时长

6 months

期刊介绍： The Knee is an international journal publishing studies on the clinical treatment and fundamental biomechanical characteristics of this joint. The aim of the journal is to provide a vehicle relevant to surgeons, biomedical engineers, imaging specialists, materials scientists, rehabilitation personnel and all those with an interest in the knee. The topics covered include, but are not limited to: • Anatomy, physiology, morphology and biochemistry; • Biomechanical studies; • Advances in the development of prosthetic, orthotic and augmentation devices; • Imaging and diagnostic techniques; • Pathology; • Trauma; • Surgery; • Rehabilitation.