Assessing the Usability of ChatGPT Responses Compared to Other Online Information in Hand Surgery.

IF 1.8 Q2 ORTHOPEDICS

HAND Pub Date : 2025-04-12 DOI:10.1177/15589447251329584

Ophelie Z Lavoie-Gagne, Oscar Y Shen, Neal C Chen, Abhiram R Bhashyam

{"title":"Assessing the Usability of ChatGPT Responses Compared to Other Online Information in Hand Surgery.","authors":"Ophelie Z Lavoie-Gagne, Oscar Y Shen, Neal C Chen, Abhiram R Bhashyam","doi":"10.1177/15589447251329584","DOIUrl":null,"url":null,"abstract":"Background: ChatGPT is a natural language processing tool with potential to increase accessibility of health information. This study aimed to: (1) assess usability of online medical information for hand surgery topics; and (2) evaluate the influence of medical consensus.Methods: Three phrases were posed 20 times each to Google, ChatGPT-3.5, and ChatGPT-4.0: \"What is the cause of carpal tunnel syndrome?\" (high consensus), \"What is the cause of tennis elbow?\" (moderate consensus), and \"Platelet-rich plasma for thumb arthritis?\" (low consensus). Readability was assessed by grade level while reliability and accuracy were scored based on predetermined rubrics. Scores were compared via Mann-Whitney U tests with alpha set to .05.Results: Google responses had superior readability for moderate-high consensus topics (P < .0001) with an average eighth-grade reading level compared to college sophomore level for ChatGPT. Low consensus topics had poor readability throughout. ChatGPT-4 responses had similar reliability but significantly inferior readability to ChatGPT-3.5 for low medical consensus topics (P < .01). There was no significant difference in accuracy between sources. ChatGPT-4 and Google had differing coverage of cause of disease (P < .05) and procedure details/efficacy/alternatives (P < .05) with similar coverage of anatomy and pathophysiology.Conclusions: Compared to Google, ChatGPT does not provide readable responses when providing reliable medical information. While patients can modulate ChatGPT readability with prompt engineering, this requires insight into their health literacy and is an additional barrier to accessing medical information. Medical consensus influences usability of online medical information for both Google and ChatGPT. Providers should remain aware of ChatGPT limitations in distributing medical information.","PeriodicalId":12902,"journal":{"name":"HAND","volume":" ","pages":"15589447251329584"},"PeriodicalIF":1.8000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11993548/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HAND","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/15589447251329584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: ChatGPT is a natural language processing tool with potential to increase accessibility of health information. This study aimed to: (1) assess usability of online medical information for hand surgery topics; and (2) evaluate the influence of medical consensus.

Methods: Three phrases were posed 20 times each to Google, ChatGPT-3.5, and ChatGPT-4.0: "What is the cause of carpal tunnel syndrome?" (high consensus), "What is the cause of tennis elbow?" (moderate consensus), and "Platelet-rich plasma for thumb arthritis?" (low consensus). Readability was assessed by grade level while reliability and accuracy were scored based on predetermined rubrics. Scores were compared via Mann-Whitney U tests with alpha set to .05.

Results: Google responses had superior readability for moderate-high consensus topics (P < .0001) with an average eighth-grade reading level compared to college sophomore level for ChatGPT. Low consensus topics had poor readability throughout. ChatGPT-4 responses had similar reliability but significantly inferior readability to ChatGPT-3.5 for low medical consensus topics (P < .01). There was no significant difference in accuracy between sources. ChatGPT-4 and Google had differing coverage of cause of disease (P < .05) and procedure details/efficacy/alternatives (P < .05) with similar coverage of anatomy and pathophysiology.

Conclusions: Compared to Google, ChatGPT does not provide readable responses when providing reliable medical information. While patients can modulate ChatGPT readability with prompt engineering, this requires insight into their health literacy and is an additional barrier to accessing medical information. Medical consensus influences usability of online medical information for both Google and ChatGPT. Providers should remain aware of ChatGPT limitations in distributing medical information.

查看原文本刊更多论文

评估ChatGPT反应与其他在线信息在手外科中的可用性。

背景：ChatGPT是一种自然语言处理工具，具有增加健康信息可及性的潜力。本研究旨在：(1)评估手外科主题在线医疗信息的可用性；(2)评价医学共识的影响。方法：对谷歌、ChatGPT-3.5、ChatGPT-4.0三个短语分别提出20次：“What is cause of carpal tunnel syndrome?”（高度一致），“网球肘的原因是什么？”（中度共识）和“富血小板血浆治疗拇指关节炎？”(低共识)。可读性按等级评定，信度和准确性按预定标准评定。通过Mann-Whitney U检验比较得分，alpha值为0.05。结果：谷歌回答在中高共识话题上具有更好的可读性（P < 0.0001），与大学二年级的ChatGPT平均阅读水平相比，谷歌的阅读水平更高。低共识主题的可读性很差。ChatGPT-4的可靠性与ChatGPT-3.5相似，但对于低医学共识话题的可读性明显低于ChatGPT-3.5 （P < 0.01）。不同来源的准确性无显著差异。ChatGPT-4和谷歌对病因的覆盖范围不同（P < 0.05），对手术细节/疗效/替代方案的覆盖范围不同（P < 0.05），但对解剖学和病理生理学的覆盖范围相似。结论：与谷歌相比，ChatGPT在提供可靠的医疗信息时没有提供可读的响应。虽然患者可以通过快速工程调整ChatGPT的可读性，但这需要了解他们的健康素养，并且是访问医疗信息的另一个障碍。医学共识影响b谷歌和ChatGPT在线医疗信息的可用性。提供者应始终意识到ChatGPT在分发医疗信息方面的局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

HAND Medicine-Surgery

CiteScore

3.30

自引率

0.00%

发文量

209

期刊介绍： HAND is the official journal of the American Association for Hand Surgery and is a peer-reviewed journal featuring articles written by clinicians worldwide presenting current research and clinical work in the field of hand surgery. It features articles related to all aspects of hand and upper extremity surgery and the post operative care and rehabilitation of the hand.