ChatGPT-4 Effectively Responds to Common Patient Questions on Total Ankle Arthroplasty: A Surgeon-Based Assessment of AI in Patient Education.

Foot & Ankle Orthopaedics Pub Date : 2025-03-27 eCollection Date: 2025-01-01 DOI:10.1177/24730114251322784

Heidi C Ventresca, Harley T Davis, Chase W Gauthier, Justin Kung, Joseph S Park, Nicholas L Strasser, Tyler A Gonzalez, J Benjamin Jackson

{"title":"ChatGPT-4 Effectively Responds to Common Patient Questions on Total Ankle Arthroplasty: A Surgeon-Based Assessment of AI in Patient Education.","authors":"Heidi C Ventresca, Harley T Davis, Chase W Gauthier, Justin Kung, Joseph S Park, Nicholas L Strasser, Tyler A Gonzalez, J Benjamin Jackson","doi":"10.1177/24730114251322784","DOIUrl":null,"url":null,"abstract":"Background: Patient reliance on internet resources for clinical information has steadily increased. The recent widespread accessibility of artificial intelligence (AI) tools like ChatGPT has increased patient reliance on these resources while also raising concerns about the accuracy, reliability, and appropriateness of the information they provide. Previous studies have evaluated ChatGPT and found it could accurately respond to questions on common surgeries, such as total hip arthroplasty, but is untested for uncommon procedures like total ankle arthroplasty (TAA). This study evaluates ChatGPT-4's performance in answering patient questions on TAA and further explores the opportunity for physician involvement in guiding the implementation of this technology.Methods: Twelve commonly asked patient questions regarding TAA were collated from established sources and posed to ChatGPT-4 without additional input. Four fellowship-trained surgeons independently rated the responses using a 1-4 scale, assessing accuracy and need for clarification. Interrater reliability, divergence, and trends in response content were analyzed to evaluate consistency across responses.Results: The mean score across all responses was 1.8, indicating an overall satisfactory performance by ChatGPT-4. Ratings were consistently good on factual questions, such as infection risk and success rates, whereas questions requiring nuanced information, such as postoperative protocols and prognosis, received poorer ratings. Significant variability was observed among surgeons' ratings and between questions, reflecting differences in interpretation and expectations.Conclusion: ChatGPT-4 demonstrates its potential to reliably provide discrete information for uncommon procedures such as TAA, but it lacks the capability to effectively respond to questions requiring patient- or surgeon-specific insight. This limitation, paired with the growing reliance on AI, highlights the need for AI tools tailored to specific clinical practices to enhance accuracy and relevance in patient education.","PeriodicalId":12429,"journal":{"name":"Foot & Ankle Orthopaedics","volume":"10 1","pages":"24730114251322784"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11951880/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foot & Ankle Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/24730114251322784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Patient reliance on internet resources for clinical information has steadily increased. The recent widespread accessibility of artificial intelligence (AI) tools like ChatGPT has increased patient reliance on these resources while also raising concerns about the accuracy, reliability, and appropriateness of the information they provide. Previous studies have evaluated ChatGPT and found it could accurately respond to questions on common surgeries, such as total hip arthroplasty, but is untested for uncommon procedures like total ankle arthroplasty (TAA). This study evaluates ChatGPT-4's performance in answering patient questions on TAA and further explores the opportunity for physician involvement in guiding the implementation of this technology.

Methods: Twelve commonly asked patient questions regarding TAA were collated from established sources and posed to ChatGPT-4 without additional input. Four fellowship-trained surgeons independently rated the responses using a 1-4 scale, assessing accuracy and need for clarification. Interrater reliability, divergence, and trends in response content were analyzed to evaluate consistency across responses.

Results: The mean score across all responses was 1.8, indicating an overall satisfactory performance by ChatGPT-4. Ratings were consistently good on factual questions, such as infection risk and success rates, whereas questions requiring nuanced information, such as postoperative protocols and prognosis, received poorer ratings. Significant variability was observed among surgeons' ratings and between questions, reflecting differences in interpretation and expectations.

Conclusion: ChatGPT-4 demonstrates its potential to reliably provide discrete information for uncommon procedures such as TAA, but it lacks the capability to effectively respond to questions requiring patient- or surgeon-specific insight. This limitation, paired with the growing reliance on AI, highlights the need for AI tools tailored to specific clinical practices to enhance accuracy and relevance in patient education.

查看原文本刊更多论文

ChatGPT-4有效回应全踝关节置换术患者常见问题：基于外科的AI患者教育评估

背景：患者对网络资源的临床信息依赖程度稳步上升。最近，ChatGPT等人工智能（AI）工具的广泛使用增加了患者对这些资源的依赖，同时也引起了对其提供的信息的准确性、可靠性和适当性的担忧。之前的研究已经对ChatGPT进行了评估，发现它可以准确地回答常见手术（如全髋关节置换术）中的问题，但尚未对全踝关节置换术（TAA）等不常见手术进行测试。本研究评估了ChatGPT-4在回答患者关于TAA的问题方面的表现，并进一步探讨了医生参与指导该技术实施的机会。方法：从已确定的来源整理12个关于TAA的常见患者问题，并在没有额外输入的情况下提交给ChatGPT-4。四名接受过奖学金培训的外科医生用1-4分的量表对回答进行独立评分，评估准确性和澄清的必要性。对应答内容的信度、差异和趋势进行分析，以评估应答之间的一致性。结果：所有回答的平均得分为1.8分，表明ChatGPT-4的总体表现令人满意。在诸如感染风险和成功率等事实性问题上，评分始终较高，而在诸如术后方案和预后等需要细致信息的问题上，评分较低。在外科医生的评分和问题之间观察到显著的差异，反映了解释和期望的差异。结论：ChatGPT-4显示了其为罕见手术（如TAA）提供可靠离散信息的潜力，但它缺乏有效响应需要患者或外科医生特定洞察力的问题的能力。这一限制，加上对人工智能的日益依赖，突显出需要针对特定临床实践量身定制的人工智能工具，以提高患者教育的准确性和相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Foot & Ankle Orthopaedics Medicine-Orthopedics and Sports Medicine

CiteScore

1.20

自引率

0.00%

发文量

1152