Alexander Jurayj, Julio Nerys-Figueroa, Emil Espinal, Michael A Gaudiani, Travis Baes, Jared Mahylis, Stephanie Muh
{"title":"与OrthoInfo相比,评估ChatGPT是否能回答关于肩袖撕裂的常见问题。","authors":"Alexander Jurayj, Julio Nerys-Figueroa, Emil Espinal, Michael A Gaudiani, Travis Baes, Jared Mahylis, Stephanie Muh","doi":"10.5435/JAAOSGlobal-D-24-00289","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate ChatGPT's (OpenAI) ability to provide accurate, appropriate, and readable responses to common patient questions about rotator cuff tears.</p><p><strong>Methods: </strong>Eight questions from the OrthoInfo rotator cuff tear web page were input into ChatGPT at two levels: standard and at a sixth-grade reading level. Five orthopaedic surgeons assessed the accuracy and appropriateness of responses using a Likert scale, and the Flesch-Kincaid Grade Level measured readability. Results were analyzed with a paired Student t-test.</p><p><strong>Results: </strong>Standard ChatGPT responses scored higher in accuracy (4.7 ± 0.47 vs. 3.6 ± 0.76; P < 0.001) and appropriateness (4.5 ± 0.57 vs. 3.7 ± 0.98; P < 0.001) compared with sixth-grade responses. However, standard ChatGPT responses were less accurate (4.7 ± 0.47 vs. 5.0 ± 0.0; P = 0.004) and appropriate (4.5 ± 0.57 vs. 5.0 ± 0.0; P = 0.016) when compared with OrthoInfo responses. OrthoInfo responses were also notably better than sixth-grade responses in both accuracy and appropriateness (P < 0.001). Standard responses had a higher Flesch-Kincaid grade level compared with both OrthoInfo and sixth-grade responses (P < 0.001).</p><p><strong>Conclusion: </strong>Standard ChatGPT responses were less accurate and appropriate, with worse readability compared with OrthoInfo responses. Despite being easier to read, sixth-grade level ChatGPT responses compromised on accuracy and appropriateness. At this time, ChatGPT is not recommended as a standalone source for patient information on rotator cuff tears but may supplement information provided by orthopaedic surgeons.</p>","PeriodicalId":45062,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons Global Research and Reviews","volume":"9 3","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11905972/pdf/","citationCount":"0","resultStr":"{\"title\":\"Evaluating if ChatGPT Can Answer Common Patient Questions Compared With OrthoInfo Regarding Rotator Cuff Tears.\",\"authors\":\"Alexander Jurayj, Julio Nerys-Figueroa, Emil Espinal, Michael A Gaudiani, Travis Baes, Jared Mahylis, Stephanie Muh\",\"doi\":\"10.5435/JAAOSGlobal-D-24-00289\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To evaluate ChatGPT's (OpenAI) ability to provide accurate, appropriate, and readable responses to common patient questions about rotator cuff tears.</p><p><strong>Methods: </strong>Eight questions from the OrthoInfo rotator cuff tear web page were input into ChatGPT at two levels: standard and at a sixth-grade reading level. Five orthopaedic surgeons assessed the accuracy and appropriateness of responses using a Likert scale, and the Flesch-Kincaid Grade Level measured readability. Results were analyzed with a paired Student t-test.</p><p><strong>Results: </strong>Standard ChatGPT responses scored higher in accuracy (4.7 ± 0.47 vs. 3.6 ± 0.76; P < 0.001) and appropriateness (4.5 ± 0.57 vs. 3.7 ± 0.98; P < 0.001) compared with sixth-grade responses. However, standard ChatGPT responses were less accurate (4.7 ± 0.47 vs. 5.0 ± 0.0; P = 0.004) and appropriate (4.5 ± 0.57 vs. 5.0 ± 0.0; P = 0.016) when compared with OrthoInfo responses. OrthoInfo responses were also notably better than sixth-grade responses in both accuracy and appropriateness (P < 0.001). Standard responses had a higher Flesch-Kincaid grade level compared with both OrthoInfo and sixth-grade responses (P < 0.001).</p><p><strong>Conclusion: </strong>Standard ChatGPT responses were less accurate and appropriate, with worse readability compared with OrthoInfo responses. Despite being easier to read, sixth-grade level ChatGPT responses compromised on accuracy and appropriateness. At this time, ChatGPT is not recommended as a standalone source for patient information on rotator cuff tears but may supplement information provided by orthopaedic surgeons.</p>\",\"PeriodicalId\":45062,\"journal\":{\"name\":\"Journal of the American Academy of Orthopaedic Surgeons Global Research and Reviews\",\"volume\":\"9 3\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-03-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11905972/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the American Academy of Orthopaedic Surgeons Global Research and Reviews\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5435/JAAOSGlobal-D-24-00289\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"ORTHOPEDICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons Global Research and Reviews","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5435/JAAOSGlobal-D-24-00289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
摘要
目的:评估ChatGPT (OpenAI)的能力,为患者关于肩袖撕裂的常见问题提供准确、适当和可读的回答。方法:将来自OrthoInfo肌腱套撕裂网页的8个问题按标准阅读水平和六年级阅读水平输入ChatGPT。5位骨科医生使用李克特量表评估反应的准确性和适当性,并用Flesch-Kincaid Grade Level测量可读性。结果采用配对学生t检验进行分析。结果:标准ChatGPT回答的准确率更高(4.7±0.47比3.6±0.76;P < 0.001)和适宜性(4.5±0.57∶3.7±0.98;P < 0.001)。然而,标准ChatGPT反应的准确性较低(4.7±0.47 vs 5.0±0.0;P = 0.004)和适当(4.5±0.57 vs. 5.0±0.0;P = 0.016)。在准确性和适当性方面,OrthoInfo的回答也显著优于六年级的回答(P < 0.001)。与OrthoInfo和六年级应答相比,标准应答具有更高的Flesch-Kincaid等级水平(P < 0.001)。结论:与OrthoInfo应答相比,标准ChatGPT应答的准确性和适宜性较差,可读性较差。尽管更容易阅读,但六年级水平的ChatGPT回答在准确性和适当性上有所妥协。目前,ChatGPT不推荐作为肌腱套撕裂患者信息的独立来源,但可以补充骨科医生提供的信息。
Evaluating if ChatGPT Can Answer Common Patient Questions Compared With OrthoInfo Regarding Rotator Cuff Tears.
Purpose: To evaluate ChatGPT's (OpenAI) ability to provide accurate, appropriate, and readable responses to common patient questions about rotator cuff tears.
Methods: Eight questions from the OrthoInfo rotator cuff tear web page were input into ChatGPT at two levels: standard and at a sixth-grade reading level. Five orthopaedic surgeons assessed the accuracy and appropriateness of responses using a Likert scale, and the Flesch-Kincaid Grade Level measured readability. Results were analyzed with a paired Student t-test.
Results: Standard ChatGPT responses scored higher in accuracy (4.7 ± 0.47 vs. 3.6 ± 0.76; P < 0.001) and appropriateness (4.5 ± 0.57 vs. 3.7 ± 0.98; P < 0.001) compared with sixth-grade responses. However, standard ChatGPT responses were less accurate (4.7 ± 0.47 vs. 5.0 ± 0.0; P = 0.004) and appropriate (4.5 ± 0.57 vs. 5.0 ± 0.0; P = 0.016) when compared with OrthoInfo responses. OrthoInfo responses were also notably better than sixth-grade responses in both accuracy and appropriateness (P < 0.001). Standard responses had a higher Flesch-Kincaid grade level compared with both OrthoInfo and sixth-grade responses (P < 0.001).
Conclusion: Standard ChatGPT responses were less accurate and appropriate, with worse readability compared with OrthoInfo responses. Despite being easier to read, sixth-grade level ChatGPT responses compromised on accuracy and appropriateness. At this time, ChatGPT is not recommended as a standalone source for patient information on rotator cuff tears but may supplement information provided by orthopaedic surgeons.