Paul G Mastrokostas, Aaron B Lavi, Bruce B Zhang, Leonidas E Mastrokostas, Scott Liu, Katherine M Connors, Jennifer Hashem
{"title":"GPT-4 as a Source of Patient Information for Carpal Tunnel Surgery: A Comparative Analysis Against Google Web Search.","authors":"Paul G Mastrokostas, Aaron B Lavi, Bruce B Zhang, Leonidas E Mastrokostas, Scott Liu, Katherine M Connors, Jennifer Hashem","doi":"10.5435/JAAOS-D-24-00249","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Carpal tunnel surgery (CTS) accounts for approximately 577,000 surgeries in the United States annually. This high frequency raises concerns over the dissemination of medical information through artificial intelligence chatbots, Google, and healthcare professionals. The objectives of this study are to determine whether GPT-4 and Google differ in (1) the type of questions asked, (2) the readability of responses, and (3) the accuracy of numerical responses for the top 10 most frequently asked questions (FAQs) about CTS.</p><p><strong>Methods: </strong>A Google search was conducted to identify the top 10 FAQs related to CTS, which were then queried in GPT-4. Responses were categorized using the Rothwell classification system and evaluated for readability using Flesch Reading Ease and Flesch-Kincaid grade level scores. Statistical analyses included Cohen kappa coefficients for interobserver reliability and Student t-tests for comparing response characteristics. Statistical significance was set at the 0.05 level.</p><p><strong>Results: </strong>This study found that 70% of Google's FAQs were fact based, predominantly focusing on technical details (40%) and specific activities (40%). GPT-4's FAQs were mainly factual (50%), with technical details (40%) being the most queried topic. Complete agreement in interobserver reliability was observed. Google's answers were more readable than GPT-4's, with a Flesch Reading Ease score of 56.40 vs. 34.19 (P = 0.001) and a Flesch-Kincaid grade level of 9.93 vs. 12.85 (P = 0.007). Google responses were shorter, with an average word count of 91.50 compared with GPT-4's 162.90 (P = 0.013). For numerical responses to FAQs, GPT-4 and Google differed in nine out of 10 questions, with GPT-4 often providing broader time frames.</p><p><strong>Conclusion: </strong>GPT-4 offers a more detailed and technically oriented approach to addressing patient queries about CTS when compared with Google. This suggests that GPT-4 can offer detailed insights where patients seek more in-depth information, enhancing the quality of healthcare education.</p><p><strong>Level of evidence: </strong>NA.</p>","PeriodicalId":51098,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5435/JAAOS-D-24-00249","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Carpal tunnel surgery (CTS) accounts for approximately 577,000 surgeries in the United States annually. This high frequency raises concerns over the dissemination of medical information through artificial intelligence chatbots, Google, and healthcare professionals. The objectives of this study are to determine whether GPT-4 and Google differ in (1) the type of questions asked, (2) the readability of responses, and (3) the accuracy of numerical responses for the top 10 most frequently asked questions (FAQs) about CTS.
Methods: A Google search was conducted to identify the top 10 FAQs related to CTS, which were then queried in GPT-4. Responses were categorized using the Rothwell classification system and evaluated for readability using Flesch Reading Ease and Flesch-Kincaid grade level scores. Statistical analyses included Cohen kappa coefficients for interobserver reliability and Student t-tests for comparing response characteristics. Statistical significance was set at the 0.05 level.
Results: This study found that 70% of Google's FAQs were fact based, predominantly focusing on technical details (40%) and specific activities (40%). GPT-4's FAQs were mainly factual (50%), with technical details (40%) being the most queried topic. Complete agreement in interobserver reliability was observed. Google's answers were more readable than GPT-4's, with a Flesch Reading Ease score of 56.40 vs. 34.19 (P = 0.001) and a Flesch-Kincaid grade level of 9.93 vs. 12.85 (P = 0.007). Google responses were shorter, with an average word count of 91.50 compared with GPT-4's 162.90 (P = 0.013). For numerical responses to FAQs, GPT-4 and Google differed in nine out of 10 questions, with GPT-4 often providing broader time frames.
Conclusion: GPT-4 offers a more detailed and technically oriented approach to addressing patient queries about CTS when compared with Google. This suggests that GPT-4 can offer detailed insights where patients seek more in-depth information, enhancing the quality of healthcare education.
期刊介绍:
The Journal of the American Academy of Orthopaedic Surgeons was established in the fall of 1993 by the Academy in response to its membership’s demand for a clinical review journal. Two issues were published the first year, followed by six issues yearly from 1994 through 2004. In September 2005, JAAOS began publishing monthly issues.
Each issue includes richly illustrated peer-reviewed articles focused on clinical diagnosis and management. Special features in each issue provide commentary on developments in pharmacotherapeutics, materials and techniques, and computer applications.