American academy of Orthopedic Surgeons' OrthoInfo provides more readable information regarding meniscus injury than ChatGPT-4 while information accuracy is comparable

IF 2.7 Q1 ORTHOPEDICS

Journal of ISAKOS Joint Disorders & Orthopaedic Sports Medicine Pub Date : 2025-02-21 DOI:10.1016/j.jisako.2025.100843

Camden Bohn , Catherine Hand , Shadia Tannir , Marisa Ulrich , Sami Saniei , Miguel Girod-Hoffman , Yining Lu , Aaron Krych , Brian Forsythe

{"title":"American academy of Orthopedic Surgeons' OrthoInfo provides more readable information regarding meniscus injury than ChatGPT-4 while information accuracy is comparable","authors":"Camden Bohn , Catherine Hand , Shadia Tannir , Marisa Ulrich , Sami Saniei , Miguel Girod-Hoffman , Yining Lu , Aaron Krych , Brian Forsythe","doi":"10.1016/j.jisako.2025.100843","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>Over 61% of Americans seek health information online, often using artificial intelligence (AI) tools like ChatGPT. However, concerns persist about the readability and accessibility of AI-generated content, especially for individuals with varying health literacy levels. This study compares the readability and accuracy of ChatGPT responses on meniscus injuries with those from the American Academy of Orthopaedic Surgeons' OrthoInfo website, which is tailored for patient education. We hypothesize that while ChatGPT offers accurate information, its readability will be lower than that of OrthoInfo.</div></div><div><h3>Methods</h3><div>Seven frequently asked questions about meniscus injuries were used to compare responses from ChatGPT-4 and OrthoInfo. Readability was assessed using multiple calculators (Flesch-Kincaid, Gunning fog index, Coleman-Liau, Simple Measure of Gobbledygook Readability Formula, FORCAST Readability Formula, Fry graph, and Raygor Readability Estimate), and accuracy was evaluated by three independent reviewers on a 4-point scale. Statistical analysis included independent t-tests to compare readability and accuracy between the two sources.</div></div><div><h3>Results</h3><div>ChatGPT responses required a significantly higher education level to comprehend, with an average reading grade level of 13.8 compared to 9.8 for OrthoInfo (p < 0.01). The Flesch Reading Ease Index also indicated lower readability for ChatGPT (32.0 vs. 59.9, p < 0.01). However, both ChatGPT and OrthoInfo responses were highly accurate, with all but one ChatGPT response receiving the highest accuracy rating of 4. The response to physical exam findings was less accurate (3.3 vs. 3.6, p = 0.52).</div></div><div><h3>Conclusion</h3><div>While AI-generated responses were accurate, their readability made them less accessible than OrthoInfo, which is designed for a broad audience. This study underscores the importance of clear, accessible information for meniscal injuries and suggests that AI tools should incorporate readability metrics to enhance patient comprehension. Despite the potential of AI, resources like OrthoInfo remain essential for effectively communicating health information to the public.</div></div><div><h3>Level of evidence</h3><div>IV.</div></div>","PeriodicalId":36847,"journal":{"name":"Journal of ISAKOS Joint Disorders & Orthopaedic Sports Medicine","volume":"11 ","pages":"Article 100843"},"PeriodicalIF":2.7000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of ISAKOS Joint Disorders & Orthopaedic Sports Medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2059775425004602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction

Over 61% of Americans seek health information online, often using artificial intelligence (AI) tools like ChatGPT. However, concerns persist about the readability and accessibility of AI-generated content, especially for individuals with varying health literacy levels. This study compares the readability and accuracy of ChatGPT responses on meniscus injuries with those from the American Academy of Orthopaedic Surgeons' OrthoInfo website, which is tailored for patient education. We hypothesize that while ChatGPT offers accurate information, its readability will be lower than that of OrthoInfo.

Methods

Seven frequently asked questions about meniscus injuries were used to compare responses from ChatGPT-4 and OrthoInfo. Readability was assessed using multiple calculators (Flesch-Kincaid, Gunning fog index, Coleman-Liau, Simple Measure of Gobbledygook Readability Formula, FORCAST Readability Formula, Fry graph, and Raygor Readability Estimate), and accuracy was evaluated by three independent reviewers on a 4-point scale. Statistical analysis included independent t-tests to compare readability and accuracy between the two sources.

Results

ChatGPT responses required a significantly higher education level to comprehend, with an average reading grade level of 13.8 compared to 9.8 for OrthoInfo (p < 0.01). The Flesch Reading Ease Index also indicated lower readability for ChatGPT (32.0 vs. 59.9, p < 0.01). However, both ChatGPT and OrthoInfo responses were highly accurate, with all but one ChatGPT response receiving the highest accuracy rating of 4. The response to physical exam findings was less accurate (3.3 vs. 3.6, p = 0.52).

Conclusion

While AI-generated responses were accurate, their readability made them less accessible than OrthoInfo, which is designed for a broad audience. This study underscores the importance of clear, accessible information for meniscal injuries and suggests that AI tools should incorporate readability metrics to enhance patient comprehension. Despite the potential of AI, resources like OrthoInfo remain essential for effectively communicating health information to the public.