ChatGPT-4 Knows Its A B C D E but Cannot Cite Its Source.

IF 2.3 Q2 ORTHOPEDICS
JBJS Open Access Pub Date : 2024-09-05 eCollection Date: 2024-07-01 DOI:10.2106/JBJS.OA.24.00099
Diane Ghanem, Alexander R Zhu, Whitney Kagabo, Greg Osgood, Babar Shafiq
{"title":"ChatGPT-4 Knows Its A B C D E but Cannot Cite Its Source.","authors":"Diane Ghanem, Alexander R Zhu, Whitney Kagabo, Greg Osgood, Babar Shafiq","doi":"10.2106/JBJS.OA.24.00099","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The artificial intelligence language model Chat Generative Pretrained Transformer (ChatGPT) has shown potential as a reliable and accessible educational resource in orthopaedic surgery. Yet, the accuracy of the references behind the provided information remains elusive, which poses a concern for maintaining the integrity of medical content. This study aims to examine the accuracy of the references provided by ChatGPT-4 concerning the Airway, Breathing, Circulation, Disability, Exposure (ABCDE) approach in trauma surgery.</p><p><strong>Methods: </strong>Two independent reviewers critically assessed 30 ChatGPT-4-generated references supporting the well-established ABCDE approach to trauma protocol, grading them as 0 (nonexistent), 1 (inaccurate), or 2 (accurate). All discrepancies between the ChatGPT-4 and PubMed references were carefully reviewed and bolded. Cohen's Kappa coefficient was used to examine the agreement of the accuracy scores of the ChatGPT-4-generated references between reviewers. Descriptive statistics were used to summarize the mean reference accuracy scores. To compare the variance of the means across the 5 categories, one-way analysis of variance was used.</p><p><strong>Results: </strong>ChatGPT-4 had an average reference accuracy score of 66.7%. Of the 30 references, only 43.3% were accurate and deemed \"true\" while 56.7% were categorized as \"false\" (43.3% inaccurate and 13.3% nonexistent). The accuracy was consistent across the 5 trauma protocol categories, with no significant statistical difference (p = 0.437).</p><p><strong>Discussion: </strong>With 57% of references being inaccurate or nonexistent, ChatGPT-4 has fallen short in providing reliable and reproducible references-a concerning finding for the safety of using ChatGPT-4 for professional medical decision making without thorough verification. Only if used cautiously, with cross-referencing, can this language model act as an adjunct learning tool that can enhance comprehensiveness as well as knowledge rehearsal and manipulation.</p>","PeriodicalId":36492,"journal":{"name":"JBJS Open Access","volume":"9 3","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368215/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JBJS Open Access","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2106/JBJS.OA.24.00099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/7/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: The artificial intelligence language model Chat Generative Pretrained Transformer (ChatGPT) has shown potential as a reliable and accessible educational resource in orthopaedic surgery. Yet, the accuracy of the references behind the provided information remains elusive, which poses a concern for maintaining the integrity of medical content. This study aims to examine the accuracy of the references provided by ChatGPT-4 concerning the Airway, Breathing, Circulation, Disability, Exposure (ABCDE) approach in trauma surgery.

Methods: Two independent reviewers critically assessed 30 ChatGPT-4-generated references supporting the well-established ABCDE approach to trauma protocol, grading them as 0 (nonexistent), 1 (inaccurate), or 2 (accurate). All discrepancies between the ChatGPT-4 and PubMed references were carefully reviewed and bolded. Cohen's Kappa coefficient was used to examine the agreement of the accuracy scores of the ChatGPT-4-generated references between reviewers. Descriptive statistics were used to summarize the mean reference accuracy scores. To compare the variance of the means across the 5 categories, one-way analysis of variance was used.

Results: ChatGPT-4 had an average reference accuracy score of 66.7%. Of the 30 references, only 43.3% were accurate and deemed "true" while 56.7% were categorized as "false" (43.3% inaccurate and 13.3% nonexistent). The accuracy was consistent across the 5 trauma protocol categories, with no significant statistical difference (p = 0.437).

Discussion: With 57% of references being inaccurate or nonexistent, ChatGPT-4 has fallen short in providing reliable and reproducible references-a concerning finding for the safety of using ChatGPT-4 for professional medical decision making without thorough verification. Only if used cautiously, with cross-referencing, can this language model act as an adjunct learning tool that can enhance comprehensiveness as well as knowledge rehearsal and manipulation.

ChatGPT-4 知道它的 A B C D E,但不能引用它的来源。
前言人工智能语言模型 "聊天生成预训练转换器"(ChatGPT)已显示出作为可靠、易用的矫形外科教育资源的潜力。然而,所提供信息背后参考文献的准确性仍然难以确定,这对维护医疗内容的完整性构成了威胁。本研究旨在检查 ChatGPT-4 提供的有关创伤外科气道、呼吸、循环、残疾、暴露(ABCDE)方法的参考文献的准确性:方法: 两位独立审稿人严格评估了 ChatGPT-4 生成的 30 篇参考文献,这些参考文献支持创伤方案中行之有效的 ABCDE 方法,并将其分为 0(不存在)、1(不准确)或 2(准确)三个等级。我们仔细审查了 ChatGPT-4 和 PubMed 参考文献之间的所有差异,并用粗体标出。科恩卡帕系数(Cohen's Kappa coefficient)用于检查审稿人之间对 ChatGPT-4 生成的参考文献准确性评分的一致性。描述性统计用于总结参考文献准确性的平均得分。为了比较 5 个类别的平均值差异,使用了单因素方差分析:ChatGPT-4 的平均参考文献准确率为 66.7%。在 30 个参考资料中,只有 43.3% 是准确的并被认为是 "真实的",而 56.7% 被归类为 "错误的"(43.3% 不准确,13.3% 不存在)。5 个创伤协议类别的准确性是一致的,没有显著的统计学差异(P = 0.437):讨论:57%的参考文献不准确或不存在,ChatGPT-4 在提供可靠和可重复的参考文献方面存在不足,这一发现令人担忧,因为未经彻底验证就将 ChatGPT-4 用于专业医疗决策的安全性将受到影响。只有谨慎使用,并进行交叉引用,该语言模型才能成为一种辅助学习工具,提高全面性以及知识演练和操作能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
JBJS Open Access
JBJS Open Access Medicine-Surgery
CiteScore
5.00
自引率
0.00%
发文量
77
审稿时长
6 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信