John D Milner, Matthew S Quinn, Phillip Schmitt, Rigel P Hall, Steven Bokshan, Logan Petit, Ryan O'Donnell, Stephen E Marcaccio, Steven F DeFroda, Ramin R Tabaddor, Brett D Owens
{"title":"Performance of Artificial Intelligence in Addressing Questions Regarding Management of Osteochondritis Dissecans.","authors":"John D Milner, Matthew S Quinn, Phillip Schmitt, Rigel P Hall, Steven Bokshan, Logan Petit, Ryan O'Donnell, Stephen E Marcaccio, Steven F DeFroda, Ramin R Tabaddor, Brett D Owens","doi":"10.1177/19417381251326549","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language model (LLM)-based artificial intelligence (AI) chatbots, such as ChatGPT and Gemini, have become widespread sources of information. Few studies have evaluated LLM responses to questions about orthopaedic conditions, especially osteochondritis dissecans (OCD).</p><p><strong>Hypothesis: </strong>ChatGPT and Gemini will generate accurate responses that align with American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines.</p><p><strong>Study design: </strong>Cohort study.</p><p><strong>Level of evidence: </strong>Level 2.</p><p><strong>Methods: </strong>LLM prompts were created based on AAOS clinical guidelines on OCD diagnosis and treatment, and responses from ChatGPT and Gemini were collected. Seven fellowship-trained orthopaedic surgeons evaluated LLM responses on a 5-point Likert scale, based on 6 categories: relevance, accuracy, clarity, completeness, evidence-based, and consistency.</p><p><strong>Results: </strong>ChatGPT and Gemini exhibited strong performance across all criteria. ChatGPT mean scores were highest for clarity (4.771 ± 0.141 [mean ± SD]). Gemini scored highest for relevance and accuracy (4.286 ± 0.296, 4.286 ± 0.273). For both LLMs, the lowest scores were for evidence-based responses (ChatGPT, 3.857 ± 0.352; Gemini, 3.743 ± 0.353). For all other categories, ChatGPT mean scores were higher than Gemini scores. The consistency of responses between the 2 LLMs was rated at an overall mean of 3.486 ± 0.371. Inter-rater reliability ranged from 0.4 to 0.67 (mean, 0.59) and was highest (0.67) in the accuracy category and lowest (0.4) in the consistency category.</p><p><strong>Conclusion: </strong>LLM performance emphasizes the potential for gathering clinically relevant and accurate answers to questions regarding the diagnosis and treatment of OCD and suggests that ChatGPT may be a better model for this purpose than the Gemini model. Further evaluation of LLM information regarding other orthopaedic procedures and conditions may be necessary before LLMs can be recommended as an accurate source of orthopaedic information.</p><p><strong>Clinical relevance: </strong>Little is known about the ability of AI to provide answers regarding OCD.</p>","PeriodicalId":54276,"journal":{"name":"Sports Health-A Multidisciplinary Approach","volume":" ","pages":"19417381251326549"},"PeriodicalIF":2.7000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11966633/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sports Health-A Multidisciplinary Approach","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/19417381251326549","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Large language model (LLM)-based artificial intelligence (AI) chatbots, such as ChatGPT and Gemini, have become widespread sources of information. Few studies have evaluated LLM responses to questions about orthopaedic conditions, especially osteochondritis dissecans (OCD).
Hypothesis: ChatGPT and Gemini will generate accurate responses that align with American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines.
Study design: Cohort study.
Level of evidence: Level 2.
Methods: LLM prompts were created based on AAOS clinical guidelines on OCD diagnosis and treatment, and responses from ChatGPT and Gemini were collected. Seven fellowship-trained orthopaedic surgeons evaluated LLM responses on a 5-point Likert scale, based on 6 categories: relevance, accuracy, clarity, completeness, evidence-based, and consistency.
Results: ChatGPT and Gemini exhibited strong performance across all criteria. ChatGPT mean scores were highest for clarity (4.771 ± 0.141 [mean ± SD]). Gemini scored highest for relevance and accuracy (4.286 ± 0.296, 4.286 ± 0.273). For both LLMs, the lowest scores were for evidence-based responses (ChatGPT, 3.857 ± 0.352; Gemini, 3.743 ± 0.353). For all other categories, ChatGPT mean scores were higher than Gemini scores. The consistency of responses between the 2 LLMs was rated at an overall mean of 3.486 ± 0.371. Inter-rater reliability ranged from 0.4 to 0.67 (mean, 0.59) and was highest (0.67) in the accuracy category and lowest (0.4) in the consistency category.
Conclusion: LLM performance emphasizes the potential for gathering clinically relevant and accurate answers to questions regarding the diagnosis and treatment of OCD and suggests that ChatGPT may be a better model for this purpose than the Gemini model. Further evaluation of LLM information regarding other orthopaedic procedures and conditions may be necessary before LLMs can be recommended as an accurate source of orthopaedic information.
Clinical relevance: Little is known about the ability of AI to provide answers regarding OCD.
期刊介绍:
Sports Health: A Multidisciplinary Approach is an indispensable resource for all medical professionals involved in the training and care of the competitive or recreational athlete, including primary care physicians, orthopaedic surgeons, physical therapists, athletic trainers and other medical and health care professionals.
Published bimonthly, Sports Health is a collaborative publication from the American Orthopaedic Society for Sports Medicine (AOSSM), the American Medical Society for Sports Medicine (AMSSM), the National Athletic Trainers’ Association (NATA), and the Sports Physical Therapy Section (SPTS).
The journal publishes review articles, original research articles, case studies, images, short updates, legal briefs, editorials, and letters to the editor.
Topics include:
-Sports Injury and Treatment
-Care of the Athlete
-Athlete Rehabilitation
-Medical Issues in the Athlete
-Surgical Techniques in Sports Medicine
-Case Studies in Sports Medicine
-Images in Sports Medicine
-Legal Issues
-Pediatric Athletes
-General Sports Trauma
-Sports Psychology