A Comparison of Artificial Intelligence Platforms in the Utility of Answering Frequently Asked Questions About Carpal Tunnel Syndrome: A Cross-Sectional Study

Q3 Medicine

Journal of Hand Surgery Global Online Pub Date : 2025-09-20 DOI:10.1016/j.jhsg.2025.100831

Calista Stevens BA , Mehreen Pasha BS , Dashun Liu MS , Andrew Block MD , Anthony Parrino MD , Craig Rodner MD

{"title":"A Comparison of Artificial Intelligence Platforms in the Utility of Answering Frequently Asked Questions About Carpal Tunnel Syndrome: A Cross-Sectional Study","authors":"Calista Stevens BA , Mehreen Pasha BS , Dashun Liu MS , Andrew Block MD , Anthony Parrino MD , Craig Rodner MD","doi":"10.1016/j.jhsg.2025.100831","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>The rise of artificial intelligence (AI) in health care comes with increasing concerns about the use and integrity of the information it generates. Chat Generative Pre-Trained Transformer (ChatGPT) 3.5, Google Gemini, and Bing Copilot are free AI chatbot platforms that may be used for answering medical questions and disseminating medical information. Given that carpal tunnel syndrome accounts for 90% of all neuropathies, it is important to understand the accuracy of the information patients may be receiving. The purpose of this study is to determine the use and accuracy of responses generated by ChatGPT, Google Gemini, and Bing Copilot in answering frequently asked questions about carpal tunnel syndrome.</div></div><div><h3>Methods</h3><div>Two independent authors scored responses using the DISCERN tool. DISCERN consists of 15 questions assessing health information on a five-point scale, with total scores ranging from 15 to 75 points. Then, a two-factor analysis of variance was conducted, with scorer and chatbot type as the factors.</div></div><div><h3>Results</h3><div>One-way analysis of variance revealed no significant difference in DISCERN scores among the three chatbots. The chatbots each scored in the “fair” range, with means of 45 for ChatGPT, 48 for Bing Copilot, and 46 for Google Gemini. The average Journal of the American Medical Association score for ChatGPT and Google Gemini surpassed that of Bing Copilot, with averages of 2.3, 2.3, and 1.8, respectively.</div></div><div><h3>Conclusions</h3><div>ChatGPT, Google Gemini, and Bing Copilot platforms generated relatively reliable answers for potential patient questions about carpal tunnel syndrome. However, users should continue to be aware of the shortcomings of the information provided, given the lack of citations, potential for misconstrued information, and perpetuated biases that inherently come with using such platforms. Future studies should explore the response quality for less common orthopedic pathologies and assess patient perceptions of response readability to determine the value of AI as a patient resource across the medical field.</div></div><div><h3>Type of study/level of evidence</h3><div>Cross-sectional study V</div></div>","PeriodicalId":36920,"journal":{"name":"Journal of Hand Surgery Global Online","volume":"7 6","pages":"Article 100831"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hand Surgery Global Online","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589514125001513","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

The rise of artificial intelligence (AI) in health care comes with increasing concerns about the use and integrity of the information it generates. Chat Generative Pre-Trained Transformer (ChatGPT) 3.5, Google Gemini, and Bing Copilot are free AI chatbot platforms that may be used for answering medical questions and disseminating medical information. Given that carpal tunnel syndrome accounts for 90% of all neuropathies, it is important to understand the accuracy of the information patients may be receiving. The purpose of this study is to determine the use and accuracy of responses generated by ChatGPT, Google Gemini, and Bing Copilot in answering frequently asked questions about carpal tunnel syndrome.

Methods

Two independent authors scored responses using the DISCERN tool. DISCERN consists of 15 questions assessing health information on a five-point scale, with total scores ranging from 15 to 75 points. Then, a two-factor analysis of variance was conducted, with scorer and chatbot type as the factors.

Results

One-way analysis of variance revealed no significant difference in DISCERN scores among the three chatbots. The chatbots each scored in the “fair” range, with means of 45 for ChatGPT, 48 for Bing Copilot, and 46 for Google Gemini. The average Journal of the American Medical Association score for ChatGPT and Google Gemini surpassed that of Bing Copilot, with averages of 2.3, 2.3, and 1.8, respectively.

Conclusions

ChatGPT, Google Gemini, and Bing Copilot platforms generated relatively reliable answers for potential patient questions about carpal tunnel syndrome. However, users should continue to be aware of the shortcomings of the information provided, given the lack of citations, potential for misconstrued information, and perpetuated biases that inherently come with using such platforms. Future studies should explore the response quality for less common orthopedic pathologies and assess patient perceptions of response readability to determine the value of AI as a patient resource across the medical field.

Type of study/level of evidence

Cross-sectional study V

查看原文本刊更多论文

人工智能平台在回答腕管综合征常见问题中的效用比较：一项横断面研究

随着人工智能（AI）在医疗保健领域的兴起，人们越来越关注其产生的信息的使用和完整性。ChatGPT 3.5、谷歌Gemini和Bing Copilot是免费的人工智能聊天机器人平台，可用于回答医疗问题和传播医疗信息。鉴于腕管综合征占所有神经病变的90%，了解患者可能接收到的信息的准确性是很重要的。本研究的目的是确定ChatGPT、谷歌Gemini和Bing Copilot在回答有关腕管综合征的常见问题时所产生的反应的使用和准确性。方法两位独立作者使用DISCERN工具对回答进行评分。辨别由15个问题组成，以五分制评估健康信息，总分从15到75分不等。然后，以得分者和聊天机器人类型为影响因素，进行双因素方差分析。结果单因素方差分析显示，三种聊天机器人在DISCERN得分上无显著差异。每个聊天机器人的得分都在“公平”范围内，ChatGPT得分为45分，Bing副驾驶得分为48分，b谷歌双子座得分为46分。ChatGPT和谷歌Gemini在《美国医学协会杂志》（Journal of American Medical Association）上的平均得分分别为2.3、2.3和1.8，超过了Bing Copilot。结论schatgpt、谷歌Gemini和Bing Copilot平台为潜在患者关于腕管综合征的问题提供了相对可靠的答案。然而，用户应该继续意识到所提供信息的缺点，因为缺乏引用，信息可能被误解，以及使用此类平台固有的长期偏见。未来的研究应该探索不太常见的骨科病理的反应质量，并评估患者对反应可读性的看法，以确定人工智能作为整个医疗领域患者资源的价值。研究类型/证据水平横断面研究

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊