Suhasini Gupta, Brett D. Haislup, Ryan A. Hoffman, Anand M. Murthi
{"title":"Assessing information provided via artificial intelligence regarding distal biceps tendon repair surgery","authors":"Suhasini Gupta, Brett D. Haislup, Ryan A. Hoffman, Anand M. Murthi","doi":"10.1002/jeo2.70281","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>The purpose of this study was to analyze the quality, accuracy, reliability and readability of information provided by an Artificial Intelligence (AI) model ChatGPT (Open AI, San Francisco) regarding Distal Biceps Tendon repair surgery.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>ChatGPT 3.5 was used to answer 27 commonly asked questions regarding ‘distal biceps repair surgery’ by patients. These questions were categorized using the Rothwell criteria of <i>Fact, Policy</i> and <i>Value</i>. The answers generated by ChatGPT were analyzed using the DISCERN scale, <i>Journal of the American Medical Association</i> (JAMA) benchmark criteria, Flesch-Kincaid Reading Ease Score (FRES) and grade Level (FKGL).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The DISCERN score for <i>Fact</i>-based questions was 59, <i>Policy</i> was 61 and <i>Value</i> was 59 (all considered ‘good scores’). The JAMA benchmark criteria were 0, representing the lowest score, for all three categories of <i>Fact, Policy</i> and <i>Value</i>. The FRES score for the <i>Fact</i> questions was 24.49, <i>Policy</i> was 22.82, <i>Value</i> was 21.77 and the FKGL score for <i>Fact</i> was 14.96, <i>Policy</i> was 14.78 and <i>Value</i> was 15.00.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>The answers provided by ChatGPT were a ‘good’ source in terms of quality assessment, compared to other online resources that do not have citations as an option. The accuracy and reliability of these answers were shown to be low, with nearly a college-graduate level of readability. This indicates that physicians should caution patients when searching ChatGPT for information regarding distal biceps repairs. ChatGPT serves as a promising source for patients to learn about their procedure, although its reliability and readability are disadvantages for the average patient when utilizing the software.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jeo2.70281","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jeo2.70281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
The purpose of this study was to analyze the quality, accuracy, reliability and readability of information provided by an Artificial Intelligence (AI) model ChatGPT (Open AI, San Francisco) regarding Distal Biceps Tendon repair surgery.
Methods
ChatGPT 3.5 was used to answer 27 commonly asked questions regarding ‘distal biceps repair surgery’ by patients. These questions were categorized using the Rothwell criteria of Fact, Policy and Value. The answers generated by ChatGPT were analyzed using the DISCERN scale, Journal of the American Medical Association (JAMA) benchmark criteria, Flesch-Kincaid Reading Ease Score (FRES) and grade Level (FKGL).
Results
The DISCERN score for Fact-based questions was 59, Policy was 61 and Value was 59 (all considered ‘good scores’). The JAMA benchmark criteria were 0, representing the lowest score, for all three categories of Fact, Policy and Value. The FRES score for the Fact questions was 24.49, Policy was 22.82, Value was 21.77 and the FKGL score for Fact was 14.96, Policy was 14.78 and Value was 15.00.
Conclusion
The answers provided by ChatGPT were a ‘good’ source in terms of quality assessment, compared to other online resources that do not have citations as an option. The accuracy and reliability of these answers were shown to be low, with nearly a college-graduate level of readability. This indicates that physicians should caution patients when searching ChatGPT for information regarding distal biceps repairs. ChatGPT serves as a promising source for patients to learn about their procedure, although its reliability and readability are disadvantages for the average patient when utilizing the software.