Evaluating if ChatGPT Can Answer Common Patient Questions Compared With OrthoInfo Regarding Rotator Cuff Tears.

IF 2 Q2 ORTHOPEDICS
Alexander Jurayj, Julio Nerys-Figueroa, Emil Espinal, Michael A Gaudiani, Travis Baes, Jared Mahylis, Stephanie Muh
{"title":"Evaluating if ChatGPT Can Answer Common Patient Questions Compared With OrthoInfo Regarding Rotator Cuff Tears.","authors":"Alexander Jurayj, Julio Nerys-Figueroa, Emil Espinal, Michael A Gaudiani, Travis Baes, Jared Mahylis, Stephanie Muh","doi":"10.5435/JAAOSGlobal-D-24-00289","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate ChatGPT's (OpenAI) ability to provide accurate, appropriate, and readable responses to common patient questions about rotator cuff tears.</p><p><strong>Methods: </strong>Eight questions from the OrthoInfo rotator cuff tear web page were input into ChatGPT at two levels: standard and at a sixth-grade reading level. Five orthopaedic surgeons assessed the accuracy and appropriateness of responses using a Likert scale, and the Flesch-Kincaid Grade Level measured readability. Results were analyzed with a paired Student t-test.</p><p><strong>Results: </strong>Standard ChatGPT responses scored higher in accuracy (4.7 ± 0.47 vs. 3.6 ± 0.76; P < 0.001) and appropriateness (4.5 ± 0.57 vs. 3.7 ± 0.98; P < 0.001) compared with sixth-grade responses. However, standard ChatGPT responses were less accurate (4.7 ± 0.47 vs. 5.0 ± 0.0; P = 0.004) and appropriate (4.5 ± 0.57 vs. 5.0 ± 0.0; P = 0.016) when compared with OrthoInfo responses. OrthoInfo responses were also notably better than sixth-grade responses in both accuracy and appropriateness (P < 0.001). Standard responses had a higher Flesch-Kincaid grade level compared with both OrthoInfo and sixth-grade responses (P < 0.001).</p><p><strong>Conclusion: </strong>Standard ChatGPT responses were less accurate and appropriate, with worse readability compared with OrthoInfo responses. Despite being easier to read, sixth-grade level ChatGPT responses compromised on accuracy and appropriateness. At this time, ChatGPT is not recommended as a standalone source for patient information on rotator cuff tears but may supplement information provided by orthopaedic surgeons.</p>","PeriodicalId":45062,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons Global Research and Reviews","volume":"9 3","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11905972/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons Global Research and Reviews","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5435/JAAOSGlobal-D-24-00289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To evaluate ChatGPT's (OpenAI) ability to provide accurate, appropriate, and readable responses to common patient questions about rotator cuff tears.

Methods: Eight questions from the OrthoInfo rotator cuff tear web page were input into ChatGPT at two levels: standard and at a sixth-grade reading level. Five orthopaedic surgeons assessed the accuracy and appropriateness of responses using a Likert scale, and the Flesch-Kincaid Grade Level measured readability. Results were analyzed with a paired Student t-test.

Results: Standard ChatGPT responses scored higher in accuracy (4.7 ± 0.47 vs. 3.6 ± 0.76; P < 0.001) and appropriateness (4.5 ± 0.57 vs. 3.7 ± 0.98; P < 0.001) compared with sixth-grade responses. However, standard ChatGPT responses were less accurate (4.7 ± 0.47 vs. 5.0 ± 0.0; P = 0.004) and appropriate (4.5 ± 0.57 vs. 5.0 ± 0.0; P = 0.016) when compared with OrthoInfo responses. OrthoInfo responses were also notably better than sixth-grade responses in both accuracy and appropriateness (P < 0.001). Standard responses had a higher Flesch-Kincaid grade level compared with both OrthoInfo and sixth-grade responses (P < 0.001).

Conclusion: Standard ChatGPT responses were less accurate and appropriate, with worse readability compared with OrthoInfo responses. Despite being easier to read, sixth-grade level ChatGPT responses compromised on accuracy and appropriateness. At this time, ChatGPT is not recommended as a standalone source for patient information on rotator cuff tears but may supplement information provided by orthopaedic surgeons.

与OrthoInfo相比,评估ChatGPT是否能回答关于肩袖撕裂的常见问题。
目的:评估ChatGPT (OpenAI)的能力,为患者关于肩袖撕裂的常见问题提供准确、适当和可读的回答。方法:将来自OrthoInfo肌腱套撕裂网页的8个问题按标准阅读水平和六年级阅读水平输入ChatGPT。5位骨科医生使用李克特量表评估反应的准确性和适当性,并用Flesch-Kincaid Grade Level测量可读性。结果采用配对学生t检验进行分析。结果:标准ChatGPT回答的准确率更高(4.7±0.47比3.6±0.76;P < 0.001)和适宜性(4.5±0.57∶3.7±0.98;P < 0.001)。然而,标准ChatGPT反应的准确性较低(4.7±0.47 vs 5.0±0.0;P = 0.004)和适当(4.5±0.57 vs. 5.0±0.0;P = 0.016)。在准确性和适当性方面,OrthoInfo的回答也显著优于六年级的回答(P < 0.001)。与OrthoInfo和六年级应答相比,标准应答具有更高的Flesch-Kincaid等级水平(P < 0.001)。结论:与OrthoInfo应答相比,标准ChatGPT应答的准确性和适宜性较差,可读性较差。尽管更容易阅读,但六年级水平的ChatGPT回答在准确性和适当性上有所妥协。目前,ChatGPT不推荐作为肌腱套撕裂患者信息的独立来源,但可以补充骨科医生提供的信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.60
自引率
6.70%
发文量
282
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信