GPT-4 as a Source of Patient Information for Carpal Tunnel Surgery: A Comparative Analysis Against Google Web Search.

IF 2.6 2区医学 Q1 ORTHOPEDICS

Journal of the American Academy of Orthopaedic Surgeons Pub Date : 2025-03-25 DOI:10.5435/JAAOS-D-24-00249

Paul G Mastrokostas, Aaron B Lavi, Bruce B Zhang, Leonidas E Mastrokostas, Scott Liu, Katherine M Connors, Jennifer Hashem

{"title":"GPT-4 as a Source of Patient Information for Carpal Tunnel Surgery: A Comparative Analysis Against Google Web Search.","authors":"Paul G Mastrokostas, Aaron B Lavi, Bruce B Zhang, Leonidas E Mastrokostas, Scott Liu, Katherine M Connors, Jennifer Hashem","doi":"10.5435/JAAOS-D-24-00249","DOIUrl":null,"url":null,"abstract":"Introduction: Carpal tunnel surgery (CTS) accounts for approximately 577,000 surgeries in the United States annually. This high frequency raises concerns over the dissemination of medical information through artificial intelligence chatbots, Google, and healthcare professionals. The objectives of this study are to determine whether GPT-4 and Google differ in (1) the type of questions asked, (2) the readability of responses, and (3) the accuracy of numerical responses for the top 10 most frequently asked questions (FAQs) about CTS.Methods: A Google search was conducted to identify the top 10 FAQs related to CTS, which were then queried in GPT-4. Responses were categorized using the Rothwell classification system and evaluated for readability using Flesch Reading Ease and Flesch-Kincaid grade level scores. Statistical analyses included Cohen kappa coefficients for interobserver reliability and Student t-tests for comparing response characteristics. Statistical significance was set at the 0.05 level.Results: This study found that 70% of Google's FAQs were fact based, predominantly focusing on technical details (40%) and specific activities (40%). GPT-4's FAQs were mainly factual (50%), with technical details (40%) being the most queried topic. Complete agreement in interobserver reliability was observed. Google's answers were more readable than GPT-4's, with a Flesch Reading Ease score of 56.40 vs. 34.19 (P = 0.001) and a Flesch-Kincaid grade level of 9.93 vs. 12.85 (P = 0.007). Google responses were shorter, with an average word count of 91.50 compared with GPT-4's 162.90 (P = 0.013). For numerical responses to FAQs, GPT-4 and Google differed in nine out of 10 questions, with GPT-4 often providing broader time frames.Conclusion: GPT-4 offers a more detailed and technically oriented approach to addressing patient queries about CTS when compared with Google. This suggests that GPT-4 can offer detailed insights where patients seek more in-depth information, enhancing the quality of healthcare education.Level of evidence: NA.","PeriodicalId":51098,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5435/JAAOS-D-24-00249","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Carpal tunnel surgery (CTS) accounts for approximately 577,000 surgeries in the United States annually. This high frequency raises concerns over the dissemination of medical information through artificial intelligence chatbots, Google, and healthcare professionals. The objectives of this study are to determine whether GPT-4 and Google differ in (1) the type of questions asked, (2) the readability of responses, and (3) the accuracy of numerical responses for the top 10 most frequently asked questions (FAQs) about CTS.

Methods: A Google search was conducted to identify the top 10 FAQs related to CTS, which were then queried in GPT-4. Responses were categorized using the Rothwell classification system and evaluated for readability using Flesch Reading Ease and Flesch-Kincaid grade level scores. Statistical analyses included Cohen kappa coefficients for interobserver reliability and Student t-tests for comparing response characteristics. Statistical significance was set at the 0.05 level.

Results: This study found that 70% of Google's FAQs were fact based, predominantly focusing on technical details (40%) and specific activities (40%). GPT-4's FAQs were mainly factual (50%), with technical details (40%) being the most queried topic. Complete agreement in interobserver reliability was observed. Google's answers were more readable than GPT-4's, with a Flesch Reading Ease score of 56.40 vs. 34.19 (P = 0.001) and a Flesch-Kincaid grade level of 9.93 vs. 12.85 (P = 0.007). Google responses were shorter, with an average word count of 91.50 compared with GPT-4's 162.90 (P = 0.013). For numerical responses to FAQs, GPT-4 and Google differed in nine out of 10 questions, with GPT-4 often providing broader time frames.

Conclusion: GPT-4 offers a more detailed and technically oriented approach to addressing patient queries about CTS when compared with Google. This suggests that GPT-4 can offer detailed insights where patients seek more in-depth information, enhancing the quality of healthcare education.

Level of evidence: NA.

查看原文本刊更多论文

GPT-4作为腕管手术患者信息来源：与谷歌网络搜索的比较分析。

简介：腕管手术（CTS）在美国每年约占577,000例手术。这种高频率引发了人们对人工智能聊天机器人、谷歌和医疗保健专业人员传播医疗信息的担忧。本研究的目的是确定GPT-4和谷歌在(1)问题类型，(2)回答的可读性，以及(3)关于CTS的前10个最常见问题（FAQs）的数字回答的准确性方面是否存在差异。方法：通过谷歌检索，找出与CTS相关的前10个常见问题，并在GPT-4中进行查询。使用Rothwell分类系统对回复进行分类，并使用Flesch Reading Ease和Flesch- kincaid年级水平分数评估可读性。统计分析包括用于观察者间信度的科恩卡帕系数和用于比较反应特征的学生t检验。差异有统计学意义（0.05）。结果：该研究发现，b谷歌的faq中有70%是基于事实的，主要关注技术细节（40%）和具体活动（40%）。GPT-4的常见问题主要是事实（50%），技术细节（40%）是查询最多的话题。观察到观察者之间的信度完全一致。b谷歌的答案比GPT-4更具可读性，Flesch Reading Ease得分为56.40比34.19 (P = 0.001)， Flesch- kincaid等级水平为9.93比12.85 （P = 0.007）。谷歌的回答更短，平均字数为91.50，而GPT-4的平均字数为162.90 （P = 0.013）。对于常见问题的数值回答，GPT-4和谷歌在10个问题中有9个不同，GPT-4通常提供更广泛的时间框架。结论：与谷歌相比，GPT-4提供了更详细和技术导向的方法来解决患者对CTS的疑问。这表明GPT-4可以为患者寻求更深入的信息提供详细的见解，提高医疗保健教育的质量。证据等级：NA。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Academy of Orthopaedic Surgeons 医学-整形外科

CiteScore

6.10

自引率

6.20%

发文量

529

审稿时长

4-8 weeks

期刊介绍： The Journal of the American Academy of Orthopaedic Surgeons was established in the fall of 1993 by the Academy in response to its membership’s demand for a clinical review journal. Two issues were published the first year, followed by six issues yearly from 1994 through 2004. In September 2005, JAAOS began publishing monthly issues. Each issue includes richly illustrated peer-reviewed articles focused on clinical diagnosis and management. Special features in each issue provide commentary on developments in pharmacotherapeutics, materials and techniques, and computer applications.