Assessing information provided via artificial intelligence regarding distal biceps tendon repair surgery

IF 2 Q2 ORTHOPEDICS

Journal of Experimental Orthopaedics Pub Date : 2025-05-19 DOI:10.1002/jeo2.70281

Suhasini Gupta, Brett D. Haislup, Ryan A. Hoffman, Anand M. Murthi

{"title":"Assessing information provided via artificial intelligence regarding distal biceps tendon repair surgery","authors":"Suhasini Gupta, Brett D. Haislup, Ryan A. Hoffman, Anand M. Murthi","doi":"10.1002/jeo2.70281","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>The purpose of this study was to analyze the quality, accuracy, reliability and readability of information provided by an Artificial Intelligence (AI) model ChatGPT (Open AI, San Francisco) regarding Distal Biceps Tendon repair surgery.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>ChatGPT 3.5 was used to answer 27 commonly asked questions regarding ‘distal biceps repair surgery’ by patients. These questions were categorized using the Rothwell criteria of <i>Fact, Policy</i> and <i>Value</i>. The answers generated by ChatGPT were analyzed using the DISCERN scale, <i>Journal of the American Medical Association</i> (JAMA) benchmark criteria, Flesch-Kincaid Reading Ease Score (FRES) and grade Level (FKGL).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The DISCERN score for <i>Fact</i>-based questions was 59, <i>Policy</i> was 61 and <i>Value</i> was 59 (all considered ‘good scores’). The JAMA benchmark criteria were 0, representing the lowest score, for all three categories of <i>Fact, Policy</i> and <i>Value</i>. The FRES score for the <i>Fact</i> questions was 24.49, <i>Policy</i> was 22.82, <i>Value</i> was 21.77 and the FKGL score for <i>Fact</i> was 14.96, <i>Policy</i> was 14.78 and <i>Value</i> was 15.00.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>The answers provided by ChatGPT were a ‘good’ source in terms of quality assessment, compared to other online resources that do not have citations as an option. The accuracy and reliability of these answers were shown to be low, with nearly a college-graduate level of readability. This indicates that physicians should caution patients when searching ChatGPT for information regarding distal biceps repairs. ChatGPT serves as a promising source for patients to learn about their procedure, although its reliability and readability are disadvantages for the average patient when utilizing the software.</p>\n </section>\n </div>","PeriodicalId":36909,"journal":{"name":"Journal of Experimental Orthopaedics","volume":"12 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jeo2.70281","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Experimental Orthopaedics","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jeo2.70281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

The purpose of this study was to analyze the quality, accuracy, reliability and readability of information provided by an Artificial Intelligence (AI) model ChatGPT (Open AI, San Francisco) regarding Distal Biceps Tendon repair surgery.

Methods

ChatGPT 3.5 was used to answer 27 commonly asked questions regarding ‘distal biceps repair surgery’ by patients. These questions were categorized using the Rothwell criteria of Fact, Policy and Value. The answers generated by ChatGPT were analyzed using the DISCERN scale, Journal of the American Medical Association (JAMA) benchmark criteria, Flesch-Kincaid Reading Ease Score (FRES) and grade Level (FKGL).

Results

The DISCERN score for Fact-based questions was 59, Policy was 61 and Value was 59 (all considered ‘good scores’). The JAMA benchmark criteria were 0, representing the lowest score, for all three categories of Fact, Policy and Value. The FRES score for the Fact questions was 24.49, Policy was 22.82, Value was 21.77 and the FKGL score for Fact was 14.96, Policy was 14.78 and Value was 15.00.

Conclusion

The answers provided by ChatGPT were a ‘good’ source in terms of quality assessment, compared to other online resources that do not have citations as an option. The accuracy and reliability of these answers were shown to be low, with nearly a college-graduate level of readability. This indicates that physicians should caution patients when searching ChatGPT for information regarding distal biceps repairs. ChatGPT serves as a promising source for patients to learn about their procedure, although its reliability and readability are disadvantages for the average patient when utilizing the software.

查看原文本刊更多论文

评估通过人工智能提供的关于肱二头肌腱远端修复手术的信息

本研究的目的是分析人工智能（AI）模型ChatGPT （Open AI, San Francisco）提供的关于肱二头肌远端肌腱修复手术信息的质量、准确性、可靠性和可读性。方法采用ChatGPT 3.5对患者在“远端肱二头肌修复手术”中常见的27个问题进行问卷调查。这些问题使用罗斯威尔的事实、政策和价值标准进行分类。ChatGPT生成的答案使用DISCERN量表、美国医学会杂志（JAMA）基准标准、Flesch-Kincaid阅读轻松评分（FRES）和年级水平（FKGL）进行分析。结果基于事实问题的DISCERN得分为59分，Policy得分为61分，Value得分为59分（均被认为是“高分”）。JAMA的基准标准是0，在事实、政策和价值这三个类别中都是最低值。事实题的FRES得分为24.49分，政策题为22.82分，价值题为21.77分；事实题的FKGL得分为14.96分，政策题为14.78分，价值题为15.00分。与其他没有引用选项的在线资源相比，ChatGPT提供的答案在质量评估方面是一个“好的”来源。这些答案的准确性和可靠性都很低，可读性接近大学毕业生的水平。这表明医生在搜索ChatGPT关于远端二头肌修复的信息时应该提醒患者。ChatGPT是一个很有前途的来源，为患者了解他们的程序，虽然它的可靠性和可读性是缺点，为普通患者使用软件时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊