Quantifying the Scope of Artificial Intelligence-Assisted Writing in Orthopaedic Medical Literature: An Analysis of Prevalence and Validation of AI-Detection Software.

IF 2.6 2区 医学 Q1 ORTHOPEDICS
Joshua R Porto, Kerry A Morgan, Christian J Hecht, Robert J Burkhart, Raymond W Liu
{"title":"Quantifying the Scope of Artificial Intelligence-Assisted Writing in Orthopaedic Medical Literature: An Analysis of Prevalence and Validation of AI-Detection Software.","authors":"Joshua R Porto, Kerry A Morgan, Christian J Hecht, Robert J Burkhart, Raymond W Liu","doi":"10.5435/JAAOS-D-24-00084","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The popularization of generative artificial intelligence (AI), including Chat Generative Pre-trained Transformer (ChatGPT), has raised concerns for the integrity of academic literature. This study asked the following questions: (1) Has the popularization of publicly available generative AI, such as ChatGPT, increased the prevalence of AI-generated orthopaedic literature? (2) Can AI detectors accurately identify ChatGPT-generated text? (3) Are there associations between article characteristics and the likelihood that it was AI generated?</p><p><strong>Methods: </strong>PubMed was searched across six major orthopaedic journals to identify articles received for publication after January 1, 2023. Two hundred and forty articles were randomly selected and entered into three popular AI detectors. Twenty articles published by each journal before the release of ChatGPT were randomly selected as negative control articles. 36 positive control articles (6 per journal) were created by altering 25%, 50%, and 100% of text from negative control articles using ChatGPT and were then used to validate each detector. The mean percentage of text detected as written by AI per detector was compared between pre-ChatGPT and post-ChatGPT release articles using independent t-test. Multivariate regression analysis was conducted using percentage AI-generated text per journal, article type (ie, cohort, clinical trial, review), and month of submission.</p><p><strong>Results: </strong>One AI detector consistently and accurately identified AI-generated text in positive control articles, whereas two others showed poor sensitivity and specificity. The most accurate detector showed a modest increase in the percentage AI detected for the articles received post release of ChatGPT (+1.8%, P = 0.01). Regression analysis showed no consistent associations between likelihood of AI-generated text per journal, article type, or month of submission.</p><p><strong>Conclusions: </strong>As this study found an early, albeit modest, effect of generative AI on the orthopaedic literature, proper oversight will play a critical role in maintaining research integrity and accuracy. AI detectors may play a critical role in regulatory efforts, although they will require further development and standardization to the interpretation of their results.</p>","PeriodicalId":51098,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5435/JAAOS-D-24-00084","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: The popularization of generative artificial intelligence (AI), including Chat Generative Pre-trained Transformer (ChatGPT), has raised concerns for the integrity of academic literature. This study asked the following questions: (1) Has the popularization of publicly available generative AI, such as ChatGPT, increased the prevalence of AI-generated orthopaedic literature? (2) Can AI detectors accurately identify ChatGPT-generated text? (3) Are there associations between article characteristics and the likelihood that it was AI generated?

Methods: PubMed was searched across six major orthopaedic journals to identify articles received for publication after January 1, 2023. Two hundred and forty articles were randomly selected and entered into three popular AI detectors. Twenty articles published by each journal before the release of ChatGPT were randomly selected as negative control articles. 36 positive control articles (6 per journal) were created by altering 25%, 50%, and 100% of text from negative control articles using ChatGPT and were then used to validate each detector. The mean percentage of text detected as written by AI per detector was compared between pre-ChatGPT and post-ChatGPT release articles using independent t-test. Multivariate regression analysis was conducted using percentage AI-generated text per journal, article type (ie, cohort, clinical trial, review), and month of submission.

Results: One AI detector consistently and accurately identified AI-generated text in positive control articles, whereas two others showed poor sensitivity and specificity. The most accurate detector showed a modest increase in the percentage AI detected for the articles received post release of ChatGPT (+1.8%, P = 0.01). Regression analysis showed no consistent associations between likelihood of AI-generated text per journal, article type, or month of submission.

Conclusions: As this study found an early, albeit modest, effect of generative AI on the orthopaedic literature, proper oversight will play a critical role in maintaining research integrity and accuracy. AI detectors may play a critical role in regulatory efforts, although they will require further development and standardization to the interpretation of their results.

量化骨科医学文献中人工智能辅助写作的范围:人工智能检测软件的普及与验证分析》。
引言:包括 Chat Generative Pre-trained Transformer(ChatGPT)在内的生成式人工智能(AI)的普及引起了人们对学术文献完整性的关注。本研究提出了以下问题:(1) 公开可用的生成式人工智能(如 ChatGPT)的普及是否增加了人工智能生成的骨科文献的流行率?(2) 人工智能检测器能否准确识别 ChatGPT 生成的文本?(3)文章特征与人工智能生成的可能性之间是否存在关联?在 PubMed 上检索了六种主要骨科期刊,以确定 2023 年 1 月 1 日之后收到的发表文章。随机选取了 240 篇文章,并将其输入三种流行的人工智能检测器。每个期刊随机抽取 ChatGPT 发布前发表的 20 篇文章作为阴性对照文章。通过使用 ChatGPT 更改阴性对照文章中 25%、50% 和 100% 的文本,创建了 36 篇阳性对照文章(每种期刊 6 篇),然后用于验证每种检测器。使用独立 t 检验比较了 ChatGPT 发布前和 ChatGPT 发布后的文章,每个检测器检测到的人工智能所写文本的平均百分比。使用每个期刊、文章类型(即队列、临床试验、综述)和投稿月份的人工智能生成文本百分比进行多变量回归分析:一种人工智能检测器能持续准确地识别出阳性对照文章中人工智能生成的文本,而另外两种检测器的灵敏度和特异性较差。最准确的检测器显示,在 ChatGPT 发布后收到的文章中,人工智能检测到的百分比略有增加(+1.8%,P = 0.01)。回归分析表明,人工智能生成文本的可能性与期刊、文章类型或投稿月份之间没有一致的联系:本研究发现了人工智能对骨科文献的早期影响,尽管影响不大,但适当的监督将在保持研究的完整性和准确性方面发挥关键作用。人工智能检测器可能会在监管工作中发挥关键作用,尽管它们还需要进一步开发和标准化来解释其结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.10
自引率
6.20%
发文量
529
审稿时长
4-8 weeks
期刊介绍: The Journal of the American Academy of Orthopaedic Surgeons was established in the fall of 1993 by the Academy in response to its membership’s demand for a clinical review journal. Two issues were published the first year, followed by six issues yearly from 1994 through 2004. In September 2005, JAAOS began publishing monthly issues. Each issue includes richly illustrated peer-reviewed articles focused on clinical diagnosis and management. Special features in each issue provide commentary on developments in pharmacotherapeutics, materials and techniques, and computer applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信