Michael A McNamara, Brandon G Hill, Peter L Schilling
{"title":"The Challenges of Using ChatGPT for Clinical Decision Support in Orthopaedic Surgery: A Pilot Study.","authors":"Michael A McNamara, Brandon G Hill, Peter L Schilling","doi":"10.5435/JAAOS-D-24-01072","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Artificial intelligence (AI) technologies have recently exploded in both accessibility and applicability, including in health care. Although studies have demonstrated its ability to adequately answer simple patient issues or multiple-choice questions, its capacity for deeper complex decision making within health care is relatively untested. In this study, we aimed to delve into AI's ability to integrate multiple clinical data sources and produce a reasonable assessment and plan, specifically in the setting of an orthopaedic surgery consultant.</p><p><strong>Methods: </strong>Ten common fractures seen by orthopaedic surgeons in the emergency department were chosen. Consult notes from patients sustaining each of these fractures, seen at a level 1 academic trauma center between 2022 and 2023, were stripped of patient data. The history, physical examination, and imaging interpretations were then given to ChatGPT4 in raw and semistructured formats. The AI was asked to determine an assessment and plan as if it were an orthopaedic surgeon. The generated plans were then compared with the actual clinical course of the patient, as determined by our multispecialty trauma conference.</p><p><strong>Results: </strong>When given both raw and semistructured formats of clinical data, ChatGPT4 determined safe and reasonable plans that included the final clinical outcome of the patient scenario. Evaluating large language models is an ongoing field of research without an established quantitative rubric; therefore, our conclusions rely on subjective comparison.</p><p><strong>Conclusion: </strong>When given history, physical examination, and imaging interpretations, ChatGPT is able to synthesize complex clinical data into a reasonable and most importantly safe assessment and plan for common fractures seen by orthopaedic surgeons. Evaluating large language models is an ongoing challenge; however, using actual clinical courses as a \"benchmark\" for comparison presents a possible avenue for further research.</p>","PeriodicalId":51098,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5435/JAAOS-D-24-01072","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Artificial intelligence (AI) technologies have recently exploded in both accessibility and applicability, including in health care. Although studies have demonstrated its ability to adequately answer simple patient issues or multiple-choice questions, its capacity for deeper complex decision making within health care is relatively untested. In this study, we aimed to delve into AI's ability to integrate multiple clinical data sources and produce a reasonable assessment and plan, specifically in the setting of an orthopaedic surgery consultant.
Methods: Ten common fractures seen by orthopaedic surgeons in the emergency department were chosen. Consult notes from patients sustaining each of these fractures, seen at a level 1 academic trauma center between 2022 and 2023, were stripped of patient data. The history, physical examination, and imaging interpretations were then given to ChatGPT4 in raw and semistructured formats. The AI was asked to determine an assessment and plan as if it were an orthopaedic surgeon. The generated plans were then compared with the actual clinical course of the patient, as determined by our multispecialty trauma conference.
Results: When given both raw and semistructured formats of clinical data, ChatGPT4 determined safe and reasonable plans that included the final clinical outcome of the patient scenario. Evaluating large language models is an ongoing field of research without an established quantitative rubric; therefore, our conclusions rely on subjective comparison.
Conclusion: When given history, physical examination, and imaging interpretations, ChatGPT is able to synthesize complex clinical data into a reasonable and most importantly safe assessment and plan for common fractures seen by orthopaedic surgeons. Evaluating large language models is an ongoing challenge; however, using actual clinical courses as a "benchmark" for comparison presents a possible avenue for further research.
期刊介绍:
The Journal of the American Academy of Orthopaedic Surgeons was established in the fall of 1993 by the Academy in response to its membership’s demand for a clinical review journal. Two issues were published the first year, followed by six issues yearly from 1994 through 2004. In September 2005, JAAOS began publishing monthly issues.
Each issue includes richly illustrated peer-reviewed articles focused on clinical diagnosis and management. Special features in each issue provide commentary on developments in pharmacotherapeutics, materials and techniques, and computer applications.