Alexander Jurayj, Julio Nerys-Figueroa, Emil Espinal, Michael A Gaudiani, Travis Baes, Jared Mahylis, Stephanie Muh
{"title":"Evaluating if ChatGPT Can Answer Common Patient Questions Compared With OrthoInfo Regarding Rotator Cuff Tears.","authors":"Alexander Jurayj, Julio Nerys-Figueroa, Emil Espinal, Michael A Gaudiani, Travis Baes, Jared Mahylis, Stephanie Muh","doi":"10.5435/JAAOSGlobal-D-24-00289","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate ChatGPT's (OpenAI) ability to provide accurate, appropriate, and readable responses to common patient questions about rotator cuff tears.</p><p><strong>Methods: </strong>Eight questions from the OrthoInfo rotator cuff tear web page were input into ChatGPT at two levels: standard and at a sixth-grade reading level. Five orthopaedic surgeons assessed the accuracy and appropriateness of responses using a Likert scale, and the Flesch-Kincaid Grade Level measured readability. Results were analyzed with a paired Student t-test.</p><p><strong>Results: </strong>Standard ChatGPT responses scored higher in accuracy (4.7 ± 0.47 vs. 3.6 ± 0.76; P < 0.001) and appropriateness (4.5 ± 0.57 vs. 3.7 ± 0.98; P < 0.001) compared with sixth-grade responses. However, standard ChatGPT responses were less accurate (4.7 ± 0.47 vs. 5.0 ± 0.0; P = 0.004) and appropriate (4.5 ± 0.57 vs. 5.0 ± 0.0; P = 0.016) when compared with OrthoInfo responses. OrthoInfo responses were also notably better than sixth-grade responses in both accuracy and appropriateness (P < 0.001). Standard responses had a higher Flesch-Kincaid grade level compared with both OrthoInfo and sixth-grade responses (P < 0.001).</p><p><strong>Conclusion: </strong>Standard ChatGPT responses were less accurate and appropriate, with worse readability compared with OrthoInfo responses. Despite being easier to read, sixth-grade level ChatGPT responses compromised on accuracy and appropriateness. At this time, ChatGPT is not recommended as a standalone source for patient information on rotator cuff tears but may supplement information provided by orthopaedic surgeons.</p>","PeriodicalId":45062,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons Global Research and Reviews","volume":"9 3","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11905972/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons Global Research and Reviews","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5435/JAAOSGlobal-D-24-00289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: To evaluate ChatGPT's (OpenAI) ability to provide accurate, appropriate, and readable responses to common patient questions about rotator cuff tears.
Methods: Eight questions from the OrthoInfo rotator cuff tear web page were input into ChatGPT at two levels: standard and at a sixth-grade reading level. Five orthopaedic surgeons assessed the accuracy and appropriateness of responses using a Likert scale, and the Flesch-Kincaid Grade Level measured readability. Results were analyzed with a paired Student t-test.
Results: Standard ChatGPT responses scored higher in accuracy (4.7 ± 0.47 vs. 3.6 ± 0.76; P < 0.001) and appropriateness (4.5 ± 0.57 vs. 3.7 ± 0.98; P < 0.001) compared with sixth-grade responses. However, standard ChatGPT responses were less accurate (4.7 ± 0.47 vs. 5.0 ± 0.0; P = 0.004) and appropriate (4.5 ± 0.57 vs. 5.0 ± 0.0; P = 0.016) when compared with OrthoInfo responses. OrthoInfo responses were also notably better than sixth-grade responses in both accuracy and appropriateness (P < 0.001). Standard responses had a higher Flesch-Kincaid grade level compared with both OrthoInfo and sixth-grade responses (P < 0.001).
Conclusion: Standard ChatGPT responses were less accurate and appropriate, with worse readability compared with OrthoInfo responses. Despite being easier to read, sixth-grade level ChatGPT responses compromised on accuracy and appropriateness. At this time, ChatGPT is not recommended as a standalone source for patient information on rotator cuff tears but may supplement information provided by orthopaedic surgeons.