Can ChatGPT Reliably Answer the Most Common Patient Questions Regarding Total Shoulder Arthroplasty?

IF 2.9 2区医学 Q1 ORTHOPEDICS

Journal of Shoulder and Elbow Surgery Pub Date : 2024-10-15 DOI:10.1016/j.jse.2024.08.025

Christopher A White, Yehuda A Masturov, Eric Haunschild, Evan Michaelson, Dave R Shukla, Paul J Cagle

{"title":"Can ChatGPT Reliably Answer the Most Common Patient Questions Regarding Total Shoulder Arthroplasty?","authors":"Christopher A White, Yehuda A Masturov, Eric Haunschild, Evan Michaelson, Dave R Shukla, Paul J Cagle","doi":"10.1016/j.jse.2024.08.025","DOIUrl":null,"url":null,"abstract":"Background: Increasingly, patients are turning to artificial intelligence (AI) programs such as ChatGPT to answer medical questions either before or after consulting a physician. Although ChatGPT's popularity implies its potential in improving patient education, concerns exist regarding the validity of the chatbot's responses. Therefore, the objective of this study was to evaluate the quality and accuracy of ChatGPT's answers to commonly asked patient questions surrounding total shoulder arthroplasty (TSA).Methods: Eleven trusted healthcare websites were searched to compose a list of the 15 most frequently asked patient questions about TSA. Each question was posed to the ChatGPT user interface, with no follow-up questions or opportunity for clarification permitted. Individual response accuracy was graded by three board-certified orthopedic surgeons using an alphabetical grading system (i.e., A-F). Overall grades, descriptive analyses, and commentary were provided for each of the ChatGPT responses.Results: Overall, ChatGPT received a cumulative grade of B-. The question responses surrounding general/preoperative and postoperative questions received a grade of B- and B-, respectively. ChatGPT's responses adequately responded to patient questions with sound recommendations. However, the chatbot neglected recent research in its responses, resulting in recommendations that warrant professional clarification. The interface deferred specific questions to orthopedic surgeons in 8/15 questions, suggesting its awareness of its own limitations. Moreover, ChatGPT often went beyond the scope of the question after the first two sentences, and generally made errors when attempting to supplement its own response.Conclusion: Overall, this is the first study to our knowledge to utilize AI to answer the most common patient questions surrounding TSA. ChatGPT achieved an overall grade of B-. Ultimately, while AI is an attractive tool for initial patient inquiries, at this time it cannot provide responses to TSA-specific questions that can substitute the knowledge of an orthopedic surgeon.","PeriodicalId":50051,"journal":{"name":"Journal of Shoulder and Elbow Surgery","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Shoulder and Elbow Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jse.2024.08.025","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Increasingly, patients are turning to artificial intelligence (AI) programs such as ChatGPT to answer medical questions either before or after consulting a physician. Although ChatGPT's popularity implies its potential in improving patient education, concerns exist regarding the validity of the chatbot's responses. Therefore, the objective of this study was to evaluate the quality and accuracy of ChatGPT's answers to commonly asked patient questions surrounding total shoulder arthroplasty (TSA).

Methods: Eleven trusted healthcare websites were searched to compose a list of the 15 most frequently asked patient questions about TSA. Each question was posed to the ChatGPT user interface, with no follow-up questions or opportunity for clarification permitted. Individual response accuracy was graded by three board-certified orthopedic surgeons using an alphabetical grading system (i.e., A-F). Overall grades, descriptive analyses, and commentary were provided for each of the ChatGPT responses.

Results: Overall, ChatGPT received a cumulative grade of B-. The question responses surrounding general/preoperative and postoperative questions received a grade of B- and B-, respectively. ChatGPT's responses adequately responded to patient questions with sound recommendations. However, the chatbot neglected recent research in its responses, resulting in recommendations that warrant professional clarification. The interface deferred specific questions to orthopedic surgeons in 8/15 questions, suggesting its awareness of its own limitations. Moreover, ChatGPT often went beyond the scope of the question after the first two sentences, and generally made errors when attempting to supplement its own response.

Conclusion: Overall, this is the first study to our knowledge to utilize AI to answer the most common patient questions surrounding TSA. ChatGPT achieved an overall grade of B-. Ultimately, while AI is an attractive tool for initial patient inquiries, at this time it cannot provide responses to TSA-specific questions that can substitute the knowledge of an orthopedic surgeon.

查看原文本刊更多论文

ChatGPT 能否可靠地回答患者关于全肩关节置换术的最常见问题？

背景：越来越多的患者在咨询医生之前或之后求助于人工智能（AI）程序，如 ChatGPT 来回答医疗问题。虽然 ChatGPT 的流行意味着它在改善患者教育方面的潜力，但人们对聊天机器人回答的有效性仍存在担忧。因此，本研究的目的是评估 ChatGPT 回答患者关于全肩关节置换术（TSA）常见问题的质量和准确性：方法：我们搜索了 11 个可信赖的医疗保健网站，整理出患者最常问到的 15 个有关 TSA 的问题。每个问题都是在 ChatGPT 用户界面上提出的，没有后续问题或澄清机会。个人回答的准确性由三位获得认证的骨科外科医生使用字母分级系统（即 A-F）进行评分。对每个 ChatGPT 回答进行了总体评分、描述性分析和评论：结果：总的来说，ChatGPT 的累计评分为 B-。围绕一般/术前和术后问题的回答分别获得了 B- 和 B-。ChatGPT 的回复充分回答了患者的问题，并提出了合理的建议。但是，聊天机器人在回复中忽略了近期的研究，导致建议需要专业人员的澄清。在 8/15 个问题中，该界面将具体问题推给了骨科医生，这表明它意识到了自身的局限性。此外，ChatGPT 在回答问题的前两句后经常会超出问题的范围，而且在试图补充自己的回答时通常会出错：总的来说，据我们所知，这是第一项利用人工智能来回答与 TSA 有关的最常见患者问题的研究。ChatGPT 的总体评分为 B-。归根结底，虽然人工智能对于患者的初步咨询来说是一个很有吸引力的工具，但目前它还不能回答与 TSA 有关的具体问题，无法取代骨科医生的知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Shoulder and Elbow Surgery 医学-外科

CiteScore

6.50

自引率

23.30%

发文量

604

审稿时长

11.2 weeks

期刊介绍： The official publication for eight leading specialty organizations, this authoritative journal is the only publication to focus exclusively on medical, surgical, and physical techniques for treating injury/disease of the upper extremity, including the shoulder girdle, arm, and elbow. Clinically oriented and peer-reviewed, the Journal provides an international forum for the exchange of information on new techniques, instruments, and materials. Journal of Shoulder and Elbow Surgery features vivid photos, professional illustrations, and explicit diagrams that demonstrate surgical approaches and depict implant devices. Topics covered include fractures, dislocations, diseases and injuries of the rotator cuff, imaging techniques, arthritis, arthroscopy, arthroplasty, and rehabilitation.