IF 2.7 2区 医学 Q1 SPORT SCIENCES
John D Milner, Matthew S Quinn, Phillip Schmitt, Rigel P Hall, Steven Bokshan, Logan Petit, Ryan O'Donnell, Stephen E Marcaccio, Steven F DeFroda, Ramin R Tabaddor, Brett D Owens
{"title":"Performance of Artificial Intelligence in Addressing Questions Regarding Management of Osteochondritis Dissecans.","authors":"John D Milner, Matthew S Quinn, Phillip Schmitt, Rigel P Hall, Steven Bokshan, Logan Petit, Ryan O'Donnell, Stephen E Marcaccio, Steven F DeFroda, Ramin R Tabaddor, Brett D Owens","doi":"10.1177/19417381251326549","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Large language model (LLM)-based artificial intelligence (AI) chatbots, such as ChatGPT and Gemini, have become widespread sources of information. Few studies have evaluated LLM responses to questions about orthopaedic conditions, especially osteochondritis dissecans (OCD).</p><p><strong>Hypothesis: </strong>ChatGPT and Gemini will generate accurate responses that align with American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines.</p><p><strong>Study design: </strong>Cohort study.</p><p><strong>Level of evidence: </strong>Level 2.</p><p><strong>Methods: </strong>LLM prompts were created based on AAOS clinical guidelines on OCD diagnosis and treatment, and responses from ChatGPT and Gemini were collected. Seven fellowship-trained orthopaedic surgeons evaluated LLM responses on a 5-point Likert scale, based on 6 categories: relevance, accuracy, clarity, completeness, evidence-based, and consistency.</p><p><strong>Results: </strong>ChatGPT and Gemini exhibited strong performance across all criteria. ChatGPT mean scores were highest for clarity (4.771 ± 0.141 [mean ± SD]). Gemini scored highest for relevance and accuracy (4.286 ± 0.296, 4.286 ± 0.273). For both LLMs, the lowest scores were for evidence-based responses (ChatGPT, 3.857 ± 0.352; Gemini, 3.743 ± 0.353). For all other categories, ChatGPT mean scores were higher than Gemini scores. The consistency of responses between the 2 LLMs was rated at an overall mean of 3.486 ± 0.371. Inter-rater reliability ranged from 0.4 to 0.67 (mean, 0.59) and was highest (0.67) in the accuracy category and lowest (0.4) in the consistency category.</p><p><strong>Conclusion: </strong>LLM performance emphasizes the potential for gathering clinically relevant and accurate answers to questions regarding the diagnosis and treatment of OCD and suggests that ChatGPT may be a better model for this purpose than the Gemini model. Further evaluation of LLM information regarding other orthopaedic procedures and conditions may be necessary before LLMs can be recommended as an accurate source of orthopaedic information.</p><p><strong>Clinical relevance: </strong>Little is known about the ability of AI to provide answers regarding OCD.</p>","PeriodicalId":54276,"journal":{"name":"Sports Health-A Multidisciplinary Approach","volume":" ","pages":"19417381251326549"},"PeriodicalIF":2.7000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11966633/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sports Health-A Multidisciplinary Approach","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/19417381251326549","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPORT SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

背景:基于大语言模型(LLM)的人工智能(AI)聊天机器人(如 ChatGPT 和 Gemini)已成为广泛的信息来源。很少有研究评估过 LLM 对骨科疾病,尤其是骨软骨炎(OCD)问题的回答:假设:ChatGPT 和 Gemini 将生成符合美国矫形外科医师学会(AAOS)临床实践指南的准确回复:研究设计:队列研究:证据等级:2 级:根据 AAOS 关于强迫症诊断和治疗的临床指南创建了 LLM 提示,并收集了 ChatGPT 和 Gemini 的回复。七名接受过研究员培训的骨科外科医生根据相关性、准确性、清晰度、完整性、循证性和一致性等 6 个类别,以 5 分制李克特量表对 LLM 回答进行了评估:结果:ChatGPT 和 Gemini 在所有标准上都表现出了很好的性能。ChatGPT 在清晰度方面的平均得分最高(4.771 ± 0.141 [平均值 ± 标准差])。Gemini 在相关性和准确性方面得分最高(4.286 ± 0.296、4.286 ± 0.273)。在两个 LLM 中,基于证据的回答得分最低(ChatGPT,3.857 ± 0.352;Gemini,3.743 ± 0.353)。在所有其他类别中,ChatGPT 的平均得分均高于 Gemini 的得分。两位 LLM 的回答一致性总平均值为 3.486 ± 0.371。评分者之间的可靠性从 0.4 到 0.67 不等(平均值为 0.59),准确性类别的可靠性最高(0.67),一致性类别的可靠性最低(0.4):LLM 的表现强调了收集与临床相关的强迫症诊断和治疗问题的准确答案的潜力,并表明 ChatGPT 可能比 Gemini 模型更适合这一目的。在推荐 LLM 作为骨科信息的准确来源之前,可能有必要进一步评估 LLM 有关其他骨科手术和病症的信息:人们对人工智能提供强迫症答案的能力知之甚少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance of Artificial Intelligence in Addressing Questions Regarding Management of Osteochondritis Dissecans.

Background: Large language model (LLM)-based artificial intelligence (AI) chatbots, such as ChatGPT and Gemini, have become widespread sources of information. Few studies have evaluated LLM responses to questions about orthopaedic conditions, especially osteochondritis dissecans (OCD).

Hypothesis: ChatGPT and Gemini will generate accurate responses that align with American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines.

Study design: Cohort study.

Level of evidence: Level 2.

Methods: LLM prompts were created based on AAOS clinical guidelines on OCD diagnosis and treatment, and responses from ChatGPT and Gemini were collected. Seven fellowship-trained orthopaedic surgeons evaluated LLM responses on a 5-point Likert scale, based on 6 categories: relevance, accuracy, clarity, completeness, evidence-based, and consistency.

Results: ChatGPT and Gemini exhibited strong performance across all criteria. ChatGPT mean scores were highest for clarity (4.771 ± 0.141 [mean ± SD]). Gemini scored highest for relevance and accuracy (4.286 ± 0.296, 4.286 ± 0.273). For both LLMs, the lowest scores were for evidence-based responses (ChatGPT, 3.857 ± 0.352; Gemini, 3.743 ± 0.353). For all other categories, ChatGPT mean scores were higher than Gemini scores. The consistency of responses between the 2 LLMs was rated at an overall mean of 3.486 ± 0.371. Inter-rater reliability ranged from 0.4 to 0.67 (mean, 0.59) and was highest (0.67) in the accuracy category and lowest (0.4) in the consistency category.

Conclusion: LLM performance emphasizes the potential for gathering clinically relevant and accurate answers to questions regarding the diagnosis and treatment of OCD and suggests that ChatGPT may be a better model for this purpose than the Gemini model. Further evaluation of LLM information regarding other orthopaedic procedures and conditions may be necessary before LLMs can be recommended as an accurate source of orthopaedic information.

Clinical relevance: Little is known about the ability of AI to provide answers regarding OCD.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Sports Health-A Multidisciplinary Approach
Sports Health-A Multidisciplinary Approach Medicine-Orthopedics and Sports Medicine
CiteScore
6.90
自引率
9.10%
发文量
101
期刊介绍: Sports Health: A Multidisciplinary Approach is an indispensable resource for all medical professionals involved in the training and care of the competitive or recreational athlete, including primary care physicians, orthopaedic surgeons, physical therapists, athletic trainers and other medical and health care professionals. Published bimonthly, Sports Health is a collaborative publication from the American Orthopaedic Society for Sports Medicine (AOSSM), the American Medical Society for Sports Medicine (AMSSM), the National Athletic Trainers’ Association (NATA), and the Sports Physical Therapy Section (SPTS). The journal publishes review articles, original research articles, case studies, images, short updates, legal briefs, editorials, and letters to the editor. Topics include: -Sports Injury and Treatment -Care of the Athlete -Athlete Rehabilitation -Medical Issues in the Athlete -Surgical Techniques in Sports Medicine -Case Studies in Sports Medicine -Images in Sports Medicine -Legal Issues -Pediatric Athletes -General Sports Trauma -Sports Psychology
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信