The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain

IF 4.4 1区医学 Q1 ORTHOPEDICS

Arthroscopy-The Journal of Arthroscopic and Related Surgery Pub Date : 2025-05-01 DOI:10.1016/j.arthro.2024.06.021

Kyle N. Kunze M.D. , Nathan H. Varady M.D., M.B.A. , Michael Mazzucco B.S. , Amy Z. Lu B.S. , Jorge Chahla M.D., Ph.D. , R. Kyle Martin M.D., F.R.C.S.C. , Anil S. Ranawat M.D. , Andrew D. Pearle M.D. , Riley J. Williams III M.D.

{"title":"The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain","authors":"Kyle N. Kunze M.D. , Nathan H. Varady M.D., M.B.A. , Michael Mazzucco B.S. , Amy Z. Lu B.S. , Jorge Chahla M.D., Ph.D. , R. Kyle Martin M.D., F.R.C.S.C. , Anil S. Ranawat M.D. , Andrew D. Pearle M.D. , Riley J. Williams III M.D.","doi":"10.1016/j.arthro.2024.06.021","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To provide a proof-of-concept analysis of the appropriateness and performance of ChatGPT-4 to triage, synthesize differential diagnoses, and generate treatment plans concerning common presentations of knee pain.</div></div><div><h3>Methods</h3><div>Twenty knee complaints warranting triage and expanded scenarios were input into ChatGPT-4, with memory cleared prior to each new input to mitigate bias. For the 10 triage complaints, ChatGPT-4 was asked to generate a differential diagnosis that was graded for accuracy and suitability in comparison to a differential created by 2 orthopaedic sports medicine physicians. For the 10 clinical scenarios, ChatGPT-4 was prompted to provide treatment guidance for the patient, which was again graded. To test the higher-order capabilities of ChatGPT-4, further inquiry into these specific management recommendations was performed and graded.</div></div><div><h3>Results</h3><div>All ChatGPT-4 diagnoses were deemed appropriate within the spectrum of potential pathologies on a differential. The top diagnosis on the differential was identical between surgeons and ChatGPT-4 for 70% of scenarios, and the top diagnosis provided by the surgeon appeared as either the first or second diagnosis in 90% of scenarios. Overall, 16 of 30 diagnoses (53.3%) in the differential were identical. When provided with 10 expanded vignettes with a single diagnosis, the accuracy of ChatGPT-4 increased to 100%, with the suitability of management graded as appropriate in 90% of cases. Specific information pertaining to conservative management, surgical approaches, and related treatments was appropriate and accurate in 100% of cases.</div></div><div><h3>Conclusions</h3><div>ChatGPT-4 provided clinically reasonable diagnoses to triage patient complaints of knee pain due to various underlying conditions that were generally consistent with differentials provided by sports medicine physicians. Diagnostic performance was enhanced when providing additional information, allowing ChatGPT-4 to reach high predictive accuracy for recommendations concerning management and treatment options. However, ChatGPT-4 may show clinically important error rates for diagnosis depending on prompting strategy and information provided; therefore, further refinements are necessary prior to implementation into clinical workflows.</div></div><div><h3>Clinical Relevance</h3><div>Although ChatGPT-4 is increasingly being used by patients for health information, the potential for ChatGPT-4 to serve as a clinical support tool is unclear. In this study, we found that ChatGPT-4 was frequently able to diagnose and triage knee complaints appropriately as rated by sports medicine surgeons, suggesting that it may eventually be a useful clinical support tool.</div></div>","PeriodicalId":55459,"journal":{"name":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","volume":"41 5","pages":"Pages 1438-1447.e14"},"PeriodicalIF":4.4000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0749806324004560","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

To provide a proof-of-concept analysis of the appropriateness and performance of ChatGPT-4 to triage, synthesize differential diagnoses, and generate treatment plans concerning common presentations of knee pain.

Methods

Twenty knee complaints warranting triage and expanded scenarios were input into ChatGPT-4, with memory cleared prior to each new input to mitigate bias. For the 10 triage complaints, ChatGPT-4 was asked to generate a differential diagnosis that was graded for accuracy and suitability in comparison to a differential created by 2 orthopaedic sports medicine physicians. For the 10 clinical scenarios, ChatGPT-4 was prompted to provide treatment guidance for the patient, which was again graded. To test the higher-order capabilities of ChatGPT-4, further inquiry into these specific management recommendations was performed and graded.

Results

All ChatGPT-4 diagnoses were deemed appropriate within the spectrum of potential pathologies on a differential. The top diagnosis on the differential was identical between surgeons and ChatGPT-4 for 70% of scenarios, and the top diagnosis provided by the surgeon appeared as either the first or second diagnosis in 90% of scenarios. Overall, 16 of 30 diagnoses (53.3%) in the differential were identical. When provided with 10 expanded vignettes with a single diagnosis, the accuracy of ChatGPT-4 increased to 100%, with the suitability of management graded as appropriate in 90% of cases. Specific information pertaining to conservative management, surgical approaches, and related treatments was appropriate and accurate in 100% of cases.

Conclusions

ChatGPT-4 provided clinically reasonable diagnoses to triage patient complaints of knee pain due to various underlying conditions that were generally consistent with differentials provided by sports medicine physicians. Diagnostic performance was enhanced when providing additional information, allowing ChatGPT-4 to reach high predictive accuracy for recommendations concerning management and treatment options. However, ChatGPT-4 may show clinically important error rates for diagnosis depending on prompting strategy and information provided; therefore, further refinements are necessary prior to implementation into clinical workflows.

Clinical Relevance

Although ChatGPT-4 is increasingly being used by patients for health information, the potential for ChatGPT-4 to serve as a clinical support tool is unclear. In this study, we found that ChatGPT-4 was frequently able to diagnose and triage knee complaints appropriately as rated by sports medicine surgeons, suggesting that it may eventually be a useful clinical support tool.

查看原文本刊更多论文

大语言模型 ChatGPT-4 对各种原因引起的膝关节疼痛患者显示出卓越的分诊能力和诊断性能。

目的：对 ChatGPT-4 在分诊、综合鉴别诊断和生成有关膝关节疼痛常见表现的治疗计划方面的适当性和性能进行概念验证分析：方法：在 ChatGPT-4 中输入了 20 个需要分诊的膝关节主诉和扩展情景，每次输入新内容前都会清除内存以减少偏差。对于 10 个分诊病例，要求 ChatGPT-4 生成鉴别诊断，并与两位骨科运动医学医生的鉴别诊断进行比较，以确定其准确性和适用性。在 10 个临床场景中，ChatGPT-4 被要求为患者提供治疗指导，并再次进行评分。为了测试 ChatGPT-4 的高阶能力，还对这些具体的治疗建议进行了进一步查询和评分：结果：所有 ChatGPT-4 诊断在鉴别诊断的潜在病理范围内都被认为是适当的。在 70% 的情况下，外科医生和 ChatGPT-4 在鉴别诊断中的首要诊断是相同的，而在 90% 的情况下，外科医生提供的首要诊断是第一或第二诊断。总体而言，16/30（53.3%）的诊断结果是相同的。当提供 10 个具有单一诊断的扩展小故事时，ChatGPT-4 的准确率提高到了 100%，90% 的病例都将管理的适宜性评定为适当。与保守治疗、手术方法和相关治疗有关的具体信息在 100% 的病例中都是适当和准确的：结论：ChatGPT-4 提供了临床上合理的诊断，以分流因各种潜在疾病引起的膝关节疼痛患者，这与运动医学医生提供的鉴别诊断基本一致。在提供更多信息时，诊断性能会得到提高，从而使 ChatGPT-4 对管理和治疗方案的建议达到较高的预测准确性。不过，根据提示策略和提供的信息，ChatGPT-4 可能会显示出临床上重要的诊断错误率；因此，在将其应用于临床工作流程之前，有必要进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Arthroscopy-The Journal of Arthroscopic and Related Surgery 医学-外科

CiteScore

9.30

自引率

17.00%

发文量

555

审稿时长

58 days

期刊介绍： Nowhere is minimally invasive surgery explained better than in Arthroscopy, the leading peer-reviewed journal in the field. Every issue enables you to put into perspective the usefulness of the various emerging arthroscopic techniques. The advantages and disadvantages of these methods -- along with their applications in various situations -- are discussed in relation to their efficiency, efficacy and cost benefit. As a special incentive, paid subscribers also receive access to the journal expanded website.