The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain
Kyle N. Kunze M.D. , Nathan H. Varady M.D., M.B.A. , Michael Mazzucco B.S. , Amy Z. Lu B.S. , Jorge Chahla M.D., Ph.D. , R. Kyle Martin M.D., F.R.C.S.C. , Anil S. Ranawat M.D. , Andrew D. Pearle M.D. , Riley J. Williams III M.D.
{"title":"The Large Language Model ChatGPT-4 Exhibits Excellent Triage Capabilities and Diagnostic Performance for Patients Presenting With Various Causes of Knee Pain","authors":"Kyle N. Kunze M.D. , Nathan H. Varady M.D., M.B.A. , Michael Mazzucco B.S. , Amy Z. Lu B.S. , Jorge Chahla M.D., Ph.D. , R. Kyle Martin M.D., F.R.C.S.C. , Anil S. Ranawat M.D. , Andrew D. Pearle M.D. , Riley J. Williams III M.D.","doi":"10.1016/j.arthro.2024.06.021","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>To provide a proof-of-concept analysis of the appropriateness and performance of ChatGPT-4 to triage, synthesize differential diagnoses, and generate treatment plans concerning common presentations of knee pain.</div></div><div><h3>Methods</h3><div>Twenty knee complaints warranting triage and expanded scenarios were input into ChatGPT-4, with memory cleared prior to each new input to mitigate bias. For the 10 triage complaints, ChatGPT-4 was asked to generate a differential diagnosis that was graded for accuracy and suitability in comparison to a differential created by 2 orthopaedic sports medicine physicians. For the 10 clinical scenarios, ChatGPT-4 was prompted to provide treatment guidance for the patient, which was again graded. To test the higher-order capabilities of ChatGPT-4, further inquiry into these specific management recommendations was performed and graded.</div></div><div><h3>Results</h3><div>All ChatGPT-4 diagnoses were deemed appropriate within the spectrum of potential pathologies on a differential. The top diagnosis on the differential was identical between surgeons and ChatGPT-4 for 70% of scenarios, and the top diagnosis provided by the surgeon appeared as either the first or second diagnosis in 90% of scenarios. Overall, 16 of 30 diagnoses (53.3%) in the differential were identical. When provided with 10 expanded vignettes with a single diagnosis, the accuracy of ChatGPT-4 increased to 100%, with the suitability of management graded as appropriate in 90% of cases. Specific information pertaining to conservative management, surgical approaches, and related treatments was appropriate and accurate in 100% of cases.</div></div><div><h3>Conclusions</h3><div>ChatGPT-4 provided clinically reasonable diagnoses to triage patient complaints of knee pain due to various underlying conditions that were generally consistent with differentials provided by sports medicine physicians. Diagnostic performance was enhanced when providing additional information, allowing ChatGPT-4 to reach high predictive accuracy for recommendations concerning management and treatment options. However, ChatGPT-4 may show clinically important error rates for diagnosis depending on prompting strategy and information provided; therefore, further refinements are necessary prior to implementation into clinical workflows.</div></div><div><h3>Clinical Relevance</h3><div>Although ChatGPT-4 is increasingly being used by patients for health information, the potential for ChatGPT-4 to serve as a clinical support tool is unclear. In this study, we found that ChatGPT-4 was frequently able to diagnose and triage knee complaints appropriately as rated by sports medicine surgeons, suggesting that it may eventually be a useful clinical support tool.</div></div>","PeriodicalId":55459,"journal":{"name":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","volume":"41 5","pages":"Pages 1438-1447.e14"},"PeriodicalIF":4.4000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arthroscopy-The Journal of Arthroscopic and Related Surgery","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0749806324004560","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
To provide a proof-of-concept analysis of the appropriateness and performance of ChatGPT-4 to triage, synthesize differential diagnoses, and generate treatment plans concerning common presentations of knee pain.
Methods
Twenty knee complaints warranting triage and expanded scenarios were input into ChatGPT-4, with memory cleared prior to each new input to mitigate bias. For the 10 triage complaints, ChatGPT-4 was asked to generate a differential diagnosis that was graded for accuracy and suitability in comparison to a differential created by 2 orthopaedic sports medicine physicians. For the 10 clinical scenarios, ChatGPT-4 was prompted to provide treatment guidance for the patient, which was again graded. To test the higher-order capabilities of ChatGPT-4, further inquiry into these specific management recommendations was performed and graded.
Results
All ChatGPT-4 diagnoses were deemed appropriate within the spectrum of potential pathologies on a differential. The top diagnosis on the differential was identical between surgeons and ChatGPT-4 for 70% of scenarios, and the top diagnosis provided by the surgeon appeared as either the first or second diagnosis in 90% of scenarios. Overall, 16 of 30 diagnoses (53.3%) in the differential were identical. When provided with 10 expanded vignettes with a single diagnosis, the accuracy of ChatGPT-4 increased to 100%, with the suitability of management graded as appropriate in 90% of cases. Specific information pertaining to conservative management, surgical approaches, and related treatments was appropriate and accurate in 100% of cases.
Conclusions
ChatGPT-4 provided clinically reasonable diagnoses to triage patient complaints of knee pain due to various underlying conditions that were generally consistent with differentials provided by sports medicine physicians. Diagnostic performance was enhanced when providing additional information, allowing ChatGPT-4 to reach high predictive accuracy for recommendations concerning management and treatment options. However, ChatGPT-4 may show clinically important error rates for diagnosis depending on prompting strategy and information provided; therefore, further refinements are necessary prior to implementation into clinical workflows.
Clinical Relevance
Although ChatGPT-4 is increasingly being used by patients for health information, the potential for ChatGPT-4 to serve as a clinical support tool is unclear. In this study, we found that ChatGPT-4 was frequently able to diagnose and triage knee complaints appropriately as rated by sports medicine surgeons, suggesting that it may eventually be a useful clinical support tool.
期刊介绍:
Nowhere is minimally invasive surgery explained better than in Arthroscopy, the leading peer-reviewed journal in the field. Every issue enables you to put into perspective the usefulness of the various emerging arthroscopic techniques. The advantages and disadvantages of these methods -- along with their applications in various situations -- are discussed in relation to their efficiency, efficacy and cost benefit. As a special incentive, paid subscribers also receive access to the journal expanded website.