ChatGPT仅与ACFAS专家小组成员临床共识声明达成公平协议。

Dominick J Casciato, Joshua Calhoun
{"title":"ChatGPT仅与ACFAS专家小组成员临床共识声明达成公平协议。","authors":"Dominick J Casciato, Joshua Calhoun","doi":"10.1177/19386400251319567","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>As artificial intelligence (AI) becomes increasingly integrated into medicine and surgery, its applications are expanding rapidly-from aiding clinical documentation to providing patient information. However, its role in medical decision-making remains uncertain. This study evaluates an AI language model's alignment with clinical consensus statements in foot and ankle surgery.</p><p><strong>Methods: </strong>Clinical consensus statements from the American College of Foot and Ankle Surgeons (ACFAS; 2015-2022) were collected and rated by ChatGPT-o1 as being inappropriate, neither appropriate nor inappropriate, and appropriate. Ten repetitions of the statements were entered into ChatGPT-o1 in a random order, and the model was prompted to assign a corresponding rating. The AI-generated scores were compared to the expert panel's ratings, and intra-rater analysis was performed.</p><p><strong>Results: </strong>The analysis of 9 clinical consensus documents and 129 statements revealed an overall Cohen's kappa of 0.29 (95% CI: 0.12, 0.46), indicating fair alignment between expert panelists and ChatGPT. Overall, ankle arthritis and heel pain showed the highest concordance at 100%, while flatfoot exhibited the lowest agreement at 25%, reflecting variability between ChatGPT and expert panelists. Among the ChatGPT ratings, Cohen's kappa values ranged from 0.41 to 0.92, highlighting variability in internal reliability across topics.</p><p><strong>Conclusion: </strong>ChatGPT achieved overall fair agreement and demonstrated variable consistency when repetitively rating ACFAS expert panel clinical practice guidelines representing a variety of topics. These data reflect the need for further study of the causes, impacts, and solutions for this disparity between intelligence and human intelligence.</p><p><strong>Level of evidence: </strong>Level IV: Retrospective cohort study.</p>","PeriodicalId":73046,"journal":{"name":"Foot & ankle specialist","volume":" ","pages":"19386400251319567"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ChatGPT Achieves Only Fair Agreement with ACFAS Expert Panelist Clinical Consensus Statements.\",\"authors\":\"Dominick J Casciato, Joshua Calhoun\",\"doi\":\"10.1177/19386400251319567\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>As artificial intelligence (AI) becomes increasingly integrated into medicine and surgery, its applications are expanding rapidly-from aiding clinical documentation to providing patient information. However, its role in medical decision-making remains uncertain. This study evaluates an AI language model's alignment with clinical consensus statements in foot and ankle surgery.</p><p><strong>Methods: </strong>Clinical consensus statements from the American College of Foot and Ankle Surgeons (ACFAS; 2015-2022) were collected and rated by ChatGPT-o1 as being inappropriate, neither appropriate nor inappropriate, and appropriate. Ten repetitions of the statements were entered into ChatGPT-o1 in a random order, and the model was prompted to assign a corresponding rating. The AI-generated scores were compared to the expert panel's ratings, and intra-rater analysis was performed.</p><p><strong>Results: </strong>The analysis of 9 clinical consensus documents and 129 statements revealed an overall Cohen's kappa of 0.29 (95% CI: 0.12, 0.46), indicating fair alignment between expert panelists and ChatGPT. Overall, ankle arthritis and heel pain showed the highest concordance at 100%, while flatfoot exhibited the lowest agreement at 25%, reflecting variability between ChatGPT and expert panelists. Among the ChatGPT ratings, Cohen's kappa values ranged from 0.41 to 0.92, highlighting variability in internal reliability across topics.</p><p><strong>Conclusion: </strong>ChatGPT achieved overall fair agreement and demonstrated variable consistency when repetitively rating ACFAS expert panel clinical practice guidelines representing a variety of topics. These data reflect the need for further study of the causes, impacts, and solutions for this disparity between intelligence and human intelligence.</p><p><strong>Level of evidence: </strong>Level IV: Retrospective cohort study.</p>\",\"PeriodicalId\":73046,\"journal\":{\"name\":\"Foot & ankle specialist\",\"volume\":\" \",\"pages\":\"19386400251319567\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Foot & ankle specialist\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/19386400251319567\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foot & ankle specialist","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/19386400251319567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

导读:随着人工智能(AI)越来越多地融入医学和外科,其应用范围正在迅速扩大——从辅助临床文档到提供患者信息。然而,它在医疗决策中的作用仍然不确定。本研究评估了人工智能语言模型与足部和踝关节手术的临床共识陈述的一致性。方法:美国足踝外科学会(ACFAS)的临床共识声明;收集2015-2022),并由chatgpt - 01评定为不适当、不适当和适当。在chatgpt - 01中按随机顺序输入10个重复的语句,并提示模型分配相应的评级。将人工智能生成的分数与专家小组的评分进行比较,并进行内部评分分析。结果:对9份临床共识文件和129份声明的分析显示,总体Cohen's kappa为0.29 (95% CI: 0.12, 0.46),表明专家小组成员和ChatGPT之间的公平一致。总体而言,踝关节关节炎和脚后跟疼痛的一致性最高,为100%,而平底足的一致性最低,为25%,反映了ChatGPT和专家小组成员之间的差异。在ChatGPT评级中,Cohen的kappa值从0.41到0.92不等,突出了不同主题内部可靠性的可变性。结论:ChatGPT在对代表各种主题的ACFAS专家小组临床实践指南进行重复评分时,达到了总体公平一致,并表现出可变的一致性。这些数据反映了需要进一步研究智能和人类智能之间差异的原因、影响和解决方案。证据等级:IV级:回顾性队列研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ChatGPT Achieves Only Fair Agreement with ACFAS Expert Panelist Clinical Consensus Statements.

Introduction: As artificial intelligence (AI) becomes increasingly integrated into medicine and surgery, its applications are expanding rapidly-from aiding clinical documentation to providing patient information. However, its role in medical decision-making remains uncertain. This study evaluates an AI language model's alignment with clinical consensus statements in foot and ankle surgery.

Methods: Clinical consensus statements from the American College of Foot and Ankle Surgeons (ACFAS; 2015-2022) were collected and rated by ChatGPT-o1 as being inappropriate, neither appropriate nor inappropriate, and appropriate. Ten repetitions of the statements were entered into ChatGPT-o1 in a random order, and the model was prompted to assign a corresponding rating. The AI-generated scores were compared to the expert panel's ratings, and intra-rater analysis was performed.

Results: The analysis of 9 clinical consensus documents and 129 statements revealed an overall Cohen's kappa of 0.29 (95% CI: 0.12, 0.46), indicating fair alignment between expert panelists and ChatGPT. Overall, ankle arthritis and heel pain showed the highest concordance at 100%, while flatfoot exhibited the lowest agreement at 25%, reflecting variability between ChatGPT and expert panelists. Among the ChatGPT ratings, Cohen's kappa values ranged from 0.41 to 0.92, highlighting variability in internal reliability across topics.

Conclusion: ChatGPT achieved overall fair agreement and demonstrated variable consistency when repetitively rating ACFAS expert panel clinical practice guidelines representing a variety of topics. These data reflect the need for further study of the causes, impacts, and solutions for this disparity between intelligence and human intelligence.

Level of evidence: Level IV: Retrospective cohort study.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信