ChatGPT-3.5 and -4 provide mostly accurate information when answering patients’ questions relating to femoroacetabular impingement syndrome and arthroscopic hip surgery

IF 2.7 Q1 ORTHOPEDICS
David Slawaska-Eng , Yoan Bourgeault-Gagnon , Dan Cohen , Thierry Pauyo , Etienne L. Belzile , Olufemi R. Ayeni
{"title":"ChatGPT-3.5 and -4 provide mostly accurate information when answering patients’ questions relating to femoroacetabular impingement syndrome and arthroscopic hip surgery","authors":"David Slawaska-Eng ,&nbsp;Yoan Bourgeault-Gagnon ,&nbsp;Dan Cohen ,&nbsp;Thierry Pauyo ,&nbsp;Etienne L. Belzile ,&nbsp;Olufemi R. Ayeni","doi":"10.1016/j.jisako.2024.100376","DOIUrl":null,"url":null,"abstract":"<div><h3>Objectives</h3><div>This study aimed to evaluate the accuracy of ChatGPT in answering patient questions about femoroacetabular impingement (FAI) and arthroscopic hip surgery, comparing the performance of versions ChatGPT-3.5 (free) and ChatGPT-4 (paid).</div></div><div><h3>Methods</h3><div>Twelve frequently asked questions (FAQs) relating to FAI were selected and posed to ChatGPT-3.5 and ChatGPT-4. The responses were assessed for accuracy by three hip arthroscopy surgeons using a four-tier grading system. Statistical analyses included Wilcoxon signed-rank tests and Gwet's AC2 coefficient for interrater agreement corrected for chance and employing quadratic weights.</div></div><div><h3>Results</h3><div>The median ratings for responses ranged from “excellent not requiring clarification” to “satisfactory requiring moderate clarification.” No responses were rated as “unsatisfactory requiring substantial clarification.” The median accuracy scores were 2 (range 1–3) for ChatGPT-3.5 and 1.5 (range 1–3) for ChatGPT-4, with 25 ​% of ChatGPT-3.5's responses and 50 ​% of ChatGPT-4's responses rated as “excellent.” There was no statistical difference in performance between the two versions (p ​= ​0.279) although ChatGPT-4 showed a tendency towards higher accuracy in some areas. Interrater agreement was substantial for ChatGPT-3.5 (Gwet's AC2 ​= ​0.79 [95% confidence interval (CI) ​= ​0.6–0.94]) and moderate to substantial for ChatGPT-4 (Gwet's AC2 ​= ​0.65 [95% CI ​= ​0.43–0.87]).</div></div><div><h3>Conclusion</h3><div>Both versions of ChatGPT provided mostly accurate responses to FAQs on FAI and arthroscopic surgery, with no significant difference between the versions. The findings suggest potential utility of ChatGPT in patient education, though cautious implementation and further evaluation are recommended due to variability in response accuracy and low power of the study.</div></div><div><h3>Level of evidence</h3><div>IV.</div></div>","PeriodicalId":36847,"journal":{"name":"Journal of ISAKOS Joint Disorders & Orthopaedic Sports Medicine","volume":"10 ","pages":"Article 100376"},"PeriodicalIF":2.7000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ISAKOS Joint Disorders & Orthopaedic Sports Medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2059775424002232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives

This study aimed to evaluate the accuracy of ChatGPT in answering patient questions about femoroacetabular impingement (FAI) and arthroscopic hip surgery, comparing the performance of versions ChatGPT-3.5 (free) and ChatGPT-4 (paid).

Methods

Twelve frequently asked questions (FAQs) relating to FAI were selected and posed to ChatGPT-3.5 and ChatGPT-4. The responses were assessed for accuracy by three hip arthroscopy surgeons using a four-tier grading system. Statistical analyses included Wilcoxon signed-rank tests and Gwet's AC2 coefficient for interrater agreement corrected for chance and employing quadratic weights.

Results

The median ratings for responses ranged from “excellent not requiring clarification” to “satisfactory requiring moderate clarification.” No responses were rated as “unsatisfactory requiring substantial clarification.” The median accuracy scores were 2 (range 1–3) for ChatGPT-3.5 and 1.5 (range 1–3) for ChatGPT-4, with 25 ​% of ChatGPT-3.5's responses and 50 ​% of ChatGPT-4's responses rated as “excellent.” There was no statistical difference in performance between the two versions (p ​= ​0.279) although ChatGPT-4 showed a tendency towards higher accuracy in some areas. Interrater agreement was substantial for ChatGPT-3.5 (Gwet's AC2 ​= ​0.79 [95% confidence interval (CI) ​= ​0.6–0.94]) and moderate to substantial for ChatGPT-4 (Gwet's AC2 ​= ​0.65 [95% CI ​= ​0.43–0.87]).

Conclusion

Both versions of ChatGPT provided mostly accurate responses to FAQs on FAI and arthroscopic surgery, with no significant difference between the versions. The findings suggest potential utility of ChatGPT in patient education, though cautious implementation and further evaluation are recommended due to variability in response accuracy and low power of the study.

Level of evidence

IV.
ChatGPT 3.5 和 4 在回答患者有关股骨髋臼撞击综合征和髋关节镜手术的问题时提供了基本准确的信息。
目的:本研究旨在评估ChatGPT在回答患者关于股髋臼撞击(FAI)和关节镜髋关节手术的准确性,并比较ChatGPT-3.5(免费)和ChatGPT-4(付费)版本的性能。方法:选取与FAI相关的12个常见问题(FAQs),分别对ChatGPT-3.5和ChatGPT-4进行提问。三位髋关节镜外科医生使用四级评分系统评估反应的准确性。统计分析包括Wilcoxon sign -rank检验和Gwet的AC2系数,用于对机会进行修正并采用二次权。结果:回答的中位数评分范围从“优秀不需要澄清”到“满意需要适度澄清”。没有回答被评为“不满意,需要进行实质性澄清”。ChatGPT-3.5的中位准确度得分为2(范围1-3),ChatGPT-4的中位准确度得分为1.5(范围1-3),其中25%的ChatGPT-3.5的回答和50%的ChatGPT-4的回答被评为“优秀”。尽管ChatGPT-4在某些领域显示出更高的准确性,但两个版本之间的性能没有统计学差异(p = 0.279)。ChatGPT-3.5的评分一致(Gwet的AC2 = 0.79 [95%CI = 0.6 - 0.94]), ChatGPT-4的评分一致(Gwet的AC2 = 0.65 [95%CI = 0.43 - 0.87])。结论:两种版本的ChatGPT对FAI和关节镜手术常见问题的回答基本准确,两种版本之间无显著差异。研究结果表明ChatGPT在患者教育中的潜在效用,但由于反应准确性的变化和研究的低功率,建议谨慎实施和进一步评估。证据等级:四级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.90
自引率
6.20%
发文量
61
审稿时长
108 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信