Ophelie Z Lavoie-Gagne, Oscar Y Shen, Neal C Chen, Abhiram R Bhashyam
{"title":"Assessing the Usability of ChatGPT Responses Compared to Other Online Information in Hand Surgery.","authors":"Ophelie Z Lavoie-Gagne, Oscar Y Shen, Neal C Chen, Abhiram R Bhashyam","doi":"10.1177/15589447251329584","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>ChatGPT is a natural language processing tool with potential to increase accessibility of health information. This study aimed to: (1) assess usability of online medical information for hand surgery topics; and (2) evaluate the influence of medical consensus.</p><p><strong>Methods: </strong>Three phrases were posed 20 times each to Google, ChatGPT-3.5, and ChatGPT-4.0: \"What is the cause of carpal tunnel syndrome?\" (high consensus), \"What is the cause of tennis elbow?\" (moderate consensus), and \"Platelet-rich plasma for thumb arthritis?\" (low consensus). Readability was assessed by grade level while reliability and accuracy were scored based on predetermined rubrics. Scores were compared via Mann-Whitney <i>U</i> tests with alpha set to .05.</p><p><strong>Results: </strong>Google responses had superior readability for moderate-high consensus topics (<i>P</i> < .0001) with an average eighth-grade reading level compared to college sophomore level for ChatGPT. Low consensus topics had poor readability throughout. ChatGPT-4 responses had similar reliability but significantly inferior readability to ChatGPT-3.5 for low medical consensus topics (<i>P</i> < .01). There was no significant difference in accuracy between sources. ChatGPT-4 and Google had differing coverage of cause of disease (<i>P</i> < .05) and procedure details/efficacy/alternatives (<i>P</i> < .05) with similar coverage of anatomy and pathophysiology.</p><p><strong>Conclusions: </strong>Compared to Google, ChatGPT does not provide readable responses when providing reliable medical information. While patients can modulate ChatGPT readability with prompt engineering, this requires insight into their health literacy and is an additional barrier to accessing medical information. Medical consensus influences usability of online medical information for both Google and ChatGPT. Providers should remain aware of ChatGPT limitations in distributing medical information.</p>","PeriodicalId":12902,"journal":{"name":"HAND","volume":" ","pages":"15589447251329584"},"PeriodicalIF":1.8000,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11993548/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HAND","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/15589447251329584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Background: ChatGPT is a natural language processing tool with potential to increase accessibility of health information. This study aimed to: (1) assess usability of online medical information for hand surgery topics; and (2) evaluate the influence of medical consensus.
Methods: Three phrases were posed 20 times each to Google, ChatGPT-3.5, and ChatGPT-4.0: "What is the cause of carpal tunnel syndrome?" (high consensus), "What is the cause of tennis elbow?" (moderate consensus), and "Platelet-rich plasma for thumb arthritis?" (low consensus). Readability was assessed by grade level while reliability and accuracy were scored based on predetermined rubrics. Scores were compared via Mann-Whitney U tests with alpha set to .05.
Results: Google responses had superior readability for moderate-high consensus topics (P < .0001) with an average eighth-grade reading level compared to college sophomore level for ChatGPT. Low consensus topics had poor readability throughout. ChatGPT-4 responses had similar reliability but significantly inferior readability to ChatGPT-3.5 for low medical consensus topics (P < .01). There was no significant difference in accuracy between sources. ChatGPT-4 and Google had differing coverage of cause of disease (P < .05) and procedure details/efficacy/alternatives (P < .05) with similar coverage of anatomy and pathophysiology.
Conclusions: Compared to Google, ChatGPT does not provide readable responses when providing reliable medical information. While patients can modulate ChatGPT readability with prompt engineering, this requires insight into their health literacy and is an additional barrier to accessing medical information. Medical consensus influences usability of online medical information for both Google and ChatGPT. Providers should remain aware of ChatGPT limitations in distributing medical information.
期刊介绍:
HAND is the official journal of the American Association for Hand Surgery and is a peer-reviewed journal featuring articles written by clinicians worldwide presenting current research and clinical work in the field of hand surgery. It features articles related to all aspects of hand and upper extremity surgery and the post operative care and rehabilitation of the hand.