比较 AAOS 适当使用标准与 ChatGPT-4o 关于治疗桡骨远端骨折的建议。

IF 0.9 4区 医学 Q4 ORTHOPEDICS
Kareem S. Mohamed , Alexander Yu , Christoph A. Schroen , Akiro Duey , James Hong , Ryan Yu , Suhas Etigunta , Jamie Kator , Hannah S. Rhee , Michael R. Hausman
{"title":"比较 AAOS 适当使用标准与 ChatGPT-4o 关于治疗桡骨远端骨折的建议。","authors":"Kareem S. Mohamed ,&nbsp;Alexander Yu ,&nbsp;Christoph A. Schroen ,&nbsp;Akiro Duey ,&nbsp;James Hong ,&nbsp;Ryan Yu ,&nbsp;Suhas Etigunta ,&nbsp;Jamie Kator ,&nbsp;Hannah S. Rhee ,&nbsp;Michael R. Hausman","doi":"10.1016/j.hansur.2025.102122","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>The American Academy of Orthopaedic Surgeons (AAOS) developed appropriate use criteria (AUC) to guide treatment decisions for distal radius fractures based on expert consensus. This study aims to evaluate the accuracy of Chat Generative Pre-trained Transformer-4o (ChatGPT-4o) by comparing its appropriateness scores for distal radius fracture treatment with those from the AUC.</div></div><div><h3>Methods</h3><div>The AUC patient scenarios were categorized by factors such as fracture type (AO/OTA classification), mechanism of injury, pre-injury activity level, patient health (ASA 1–4), and associated injuries. Treatment options included percutaneous pinning, spanning external fixation, volar locking plates, dorsal plates, and immobilization methods, among others. Orthopedic surgeons assigned appropriateness scores for each treatment (1–3 = “Rarely Appropriate,” 4–6 = “May Be Appropriate,” and 7–9 = “Appropriate”). ChatGPT-4o was prompted with the same patient scenarios and asked to assign scores. Differences between AAOS and ChatGPT-4o ratings were used to calculate mean error, mean absolute error, and mean squared error. Statistical significance was assessed using Spearman correlation, and appropriateness scores were grouped into categories to determine percentage overlap between the two sources.</div></div><div><h3>Results</h3><div>A total of 240 patient scenarios and 2160 paired treatment scores were analyzed. The mean error for treatment options ranged from 0.6 for volar locking plate to -2.9 for dorsal plating. Pearson correlation revealed significant positive associations for dorsal spanning bridge (0.43, P = &lt;0.001) and spanning external fixation (0.4, P = &lt;0.001). The percentage overlap between AAOS and ChatGPT-4o in the appropriateness categories varied, with 99.17% agreement for immobilization without reduction, 90.42% for volar locking plates, and only 15% for dorsal plating.</div></div><div><h3>Conclusion</h3><div>ChatGPT-4o does not consistently align with the appropriate use criteria in determining appropriate management of distal radius fractures. While there was moderate concordance in certain treatments, ChatGPT-4o tended to favor more conservative approaches, raising concerns about the reliability of AI-generated recommendations for medical advice and clinical decision-making.</div></div>","PeriodicalId":54301,"journal":{"name":"Hand Surgery & Rehabilitation","volume":"44 2","pages":"Article 102122"},"PeriodicalIF":0.9000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing AAOS appropriate use criteria with ChatGPT-4o recommendations on treating distal radius fractures\",\"authors\":\"Kareem S. Mohamed ,&nbsp;Alexander Yu ,&nbsp;Christoph A. Schroen ,&nbsp;Akiro Duey ,&nbsp;James Hong ,&nbsp;Ryan Yu ,&nbsp;Suhas Etigunta ,&nbsp;Jamie Kator ,&nbsp;Hannah S. Rhee ,&nbsp;Michael R. Hausman\",\"doi\":\"10.1016/j.hansur.2025.102122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Introduction</h3><div>The American Academy of Orthopaedic Surgeons (AAOS) developed appropriate use criteria (AUC) to guide treatment decisions for distal radius fractures based on expert consensus. This study aims to evaluate the accuracy of Chat Generative Pre-trained Transformer-4o (ChatGPT-4o) by comparing its appropriateness scores for distal radius fracture treatment with those from the AUC.</div></div><div><h3>Methods</h3><div>The AUC patient scenarios were categorized by factors such as fracture type (AO/OTA classification), mechanism of injury, pre-injury activity level, patient health (ASA 1–4), and associated injuries. Treatment options included percutaneous pinning, spanning external fixation, volar locking plates, dorsal plates, and immobilization methods, among others. Orthopedic surgeons assigned appropriateness scores for each treatment (1–3 = “Rarely Appropriate,” 4–6 = “May Be Appropriate,” and 7–9 = “Appropriate”). ChatGPT-4o was prompted with the same patient scenarios and asked to assign scores. Differences between AAOS and ChatGPT-4o ratings were used to calculate mean error, mean absolute error, and mean squared error. Statistical significance was assessed using Spearman correlation, and appropriateness scores were grouped into categories to determine percentage overlap between the two sources.</div></div><div><h3>Results</h3><div>A total of 240 patient scenarios and 2160 paired treatment scores were analyzed. The mean error for treatment options ranged from 0.6 for volar locking plate to -2.9 for dorsal plating. Pearson correlation revealed significant positive associations for dorsal spanning bridge (0.43, P = &lt;0.001) and spanning external fixation (0.4, P = &lt;0.001). The percentage overlap between AAOS and ChatGPT-4o in the appropriateness categories varied, with 99.17% agreement for immobilization without reduction, 90.42% for volar locking plates, and only 15% for dorsal plating.</div></div><div><h3>Conclusion</h3><div>ChatGPT-4o does not consistently align with the appropriate use criteria in determining appropriate management of distal radius fractures. While there was moderate concordance in certain treatments, ChatGPT-4o tended to favor more conservative approaches, raising concerns about the reliability of AI-generated recommendations for medical advice and clinical decision-making.</div></div>\",\"PeriodicalId\":54301,\"journal\":{\"name\":\"Hand Surgery & Rehabilitation\",\"volume\":\"44 2\",\"pages\":\"Article 102122\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Hand Surgery & Rehabilitation\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2468122925000441\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hand Surgery & Rehabilitation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2468122925000441","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

摘要

本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparing AAOS appropriate use criteria with ChatGPT-4o recommendations on treating distal radius fractures

Introduction

The American Academy of Orthopaedic Surgeons (AAOS) developed appropriate use criteria (AUC) to guide treatment decisions for distal radius fractures based on expert consensus. This study aims to evaluate the accuracy of Chat Generative Pre-trained Transformer-4o (ChatGPT-4o) by comparing its appropriateness scores for distal radius fracture treatment with those from the AUC.

Methods

The AUC patient scenarios were categorized by factors such as fracture type (AO/OTA classification), mechanism of injury, pre-injury activity level, patient health (ASA 1–4), and associated injuries. Treatment options included percutaneous pinning, spanning external fixation, volar locking plates, dorsal plates, and immobilization methods, among others. Orthopedic surgeons assigned appropriateness scores for each treatment (1–3 = “Rarely Appropriate,” 4–6 = “May Be Appropriate,” and 7–9 = “Appropriate”). ChatGPT-4o was prompted with the same patient scenarios and asked to assign scores. Differences between AAOS and ChatGPT-4o ratings were used to calculate mean error, mean absolute error, and mean squared error. Statistical significance was assessed using Spearman correlation, and appropriateness scores were grouped into categories to determine percentage overlap between the two sources.

Results

A total of 240 patient scenarios and 2160 paired treatment scores were analyzed. The mean error for treatment options ranged from 0.6 for volar locking plate to -2.9 for dorsal plating. Pearson correlation revealed significant positive associations for dorsal spanning bridge (0.43, P = <0.001) and spanning external fixation (0.4, P = <0.001). The percentage overlap between AAOS and ChatGPT-4o in the appropriateness categories varied, with 99.17% agreement for immobilization without reduction, 90.42% for volar locking plates, and only 15% for dorsal plating.

Conclusion

ChatGPT-4o does not consistently align with the appropriate use criteria in determining appropriate management of distal radius fractures. While there was moderate concordance in certain treatments, ChatGPT-4o tended to favor more conservative approaches, raising concerns about the reliability of AI-generated recommendations for medical advice and clinical decision-making.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.70
自引率
27.30%
发文量
0
审稿时长
49 days
期刊介绍: As the official publication of the French, Belgian and Swiss Societies for Surgery of the Hand, as well as of the French Society of Rehabilitation of the Hand & Upper Limb, ''Hand Surgery and Rehabilitation'' - formerly named "Chirurgie de la Main" - publishes original articles, literature reviews, technical notes, and clinical cases. It is indexed in the main international databases (including Medline). Initially a platform for French-speaking hand surgeons, the journal will now publish its articles in English to disseminate its author''s scientific findings more widely. The journal also includes a biannual supplement in French, the monograph of the French Society for Surgery of the Hand, where comprehensive reviews in the fields of hand, peripheral nerve and upper limb surgery are presented. Organe officiel de la Société française de chirurgie de la main, de la Société française de Rééducation de la main (SFRM-GEMMSOR), de la Société suisse de chirurgie de la main et du Belgian Hand Group, indexée dans les grandes bases de données internationales (Medline, Embase, Pascal, Scopus), Hand Surgery and Rehabilitation - anciennement titrée Chirurgie de la main - publie des articles originaux, des revues de la littérature, des notes techniques, des cas clinique. Initialement plateforme d''expression francophone de la spécialité, la revue s''oriente désormais vers l''anglais pour devenir une référence scientifique et de formation de la spécialité en France et en Europe. Avec 6 publications en anglais par an, la revue comprend également un supplément biannuel, la monographie du GEM, où sont présentées en français, des mises au point complètes dans les domaines de la chirurgie de la main, des nerfs périphériques et du membre supérieur.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信