Quantifying the Scope of Artificial Intelligence-Assisted Writing in Orthopaedic Medical Literature: An Analysis of Prevalence and Validation of AI-Detection Software.

IF 2.6 2区医学 Q1 ORTHOPEDICS

Journal of the American Academy of Orthopaedic Surgeons Pub Date : 2025-01-01 Epub Date: 2024-11-19 DOI:10.5435/JAAOS-D-24-00084

Joshua R Porto, Kerry A Morgan, Christian J Hecht, Robert J Burkhart, Raymond W Liu

{"title":"Quantifying the Scope of Artificial Intelligence-Assisted Writing in Orthopaedic Medical Literature: An Analysis of Prevalence and Validation of AI-Detection Software.","authors":"Joshua R Porto, Kerry A Morgan, Christian J Hecht, Robert J Burkhart, Raymond W Liu","doi":"10.5435/JAAOS-D-24-00084","DOIUrl":null,"url":null,"abstract":"Introduction: The popularization of generative artificial intelligence (AI), including Chat Generative Pre-trained Transformer (ChatGPT), has raised concerns for the integrity of academic literature. This study asked the following questions: (1) Has the popularization of publicly available generative AI, such as ChatGPT, increased the prevalence of AI-generated orthopaedic literature? (2) Can AI detectors accurately identify ChatGPT-generated text? (3) Are there associations between article characteristics and the likelihood that it was AI generated?Methods: PubMed was searched across six major orthopaedic journals to identify articles received for publication after January 1, 2023. Two hundred and forty articles were randomly selected and entered into three popular AI detectors. Twenty articles published by each journal before the release of ChatGPT were randomly selected as negative control articles. 36 positive control articles (6 per journal) were created by altering 25%, 50%, and 100% of text from negative control articles using ChatGPT and were then used to validate each detector. The mean percentage of text detected as written by AI per detector was compared between pre-ChatGPT and post-ChatGPT release articles using independent t -test. Multivariate regression analysis was conducted using percentage AI-generated text per journal, article type (ie, cohort, clinical trial, review), and month of submission.Results: One AI detector consistently and accurately identified AI-generated text in positive control articles, whereas two others showed poor sensitivity and specificity. The most accurate detector showed a modest increase in the percentage AI detected for the articles received post release of ChatGPT (+1.8%, P = 0.01). Regression analysis showed no consistent associations between likelihood of AI-generated text per journal, article type, or month of submission.Conclusions: As this study found an early, albeit modest, effect of generative AI on the orthopaedic literature, proper oversight will play a critical role in maintaining research integrity and accuracy. AI detectors may play a critical role in regulatory efforts, although they will require further development and standardization to the interpretation of their results.","PeriodicalId":51098,"journal":{"name":"Journal of the American Academy of Orthopaedic Surgeons","volume":" ","pages":"42-50"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Academy of Orthopaedic Surgeons","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.5435/JAAOS-D-24-00084","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/19 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ORTHOPEDICS","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: The popularization of generative artificial intelligence (AI), including Chat Generative Pre-trained Transformer (ChatGPT), has raised concerns for the integrity of academic literature. This study asked the following questions: (1) Has the popularization of publicly available generative AI, such as ChatGPT, increased the prevalence of AI-generated orthopaedic literature? (2) Can AI detectors accurately identify ChatGPT-generated text? (3) Are there associations between article characteristics and the likelihood that it was AI generated?

Methods: PubMed was searched across six major orthopaedic journals to identify articles received for publication after January 1, 2023. Two hundred and forty articles were randomly selected and entered into three popular AI detectors. Twenty articles published by each journal before the release of ChatGPT were randomly selected as negative control articles. 36 positive control articles (6 per journal) were created by altering 25%, 50%, and 100% of text from negative control articles using ChatGPT and were then used to validate each detector. The mean percentage of text detected as written by AI per detector was compared between pre-ChatGPT and post-ChatGPT release articles using independent t -test. Multivariate regression analysis was conducted using percentage AI-generated text per journal, article type (ie, cohort, clinical trial, review), and month of submission.

Results: One AI detector consistently and accurately identified AI-generated text in positive control articles, whereas two others showed poor sensitivity and specificity. The most accurate detector showed a modest increase in the percentage AI detected for the articles received post release of ChatGPT (+1.8%, P = 0.01). Regression analysis showed no consistent associations between likelihood of AI-generated text per journal, article type, or month of submission.

Conclusions: As this study found an early, albeit modest, effect of generative AI on the orthopaedic literature, proper oversight will play a critical role in maintaining research integrity and accuracy. AI detectors may play a critical role in regulatory efforts, although they will require further development and standardization to the interpretation of their results.

查看原文本刊更多论文

量化骨科医学文献中人工智能辅助写作的范围：人工智能检测软件的普及与验证分析》。

引言：包括 Chat Generative Pre-trained Transformer（ChatGPT）在内的生成式人工智能（AI）的普及引起了人们对学术文献完整性的关注。本研究提出了以下问题：(1) 公开可用的生成式人工智能（如 ChatGPT）的普及是否增加了人工智能生成的骨科文献的流行率？(2) 人工智能检测器能否准确识别 ChatGPT 生成的文本？(3）文章特征与人工智能生成的可能性之间是否存在关联？在 PubMed 上检索了六种主要骨科期刊，以确定 2023 年 1 月 1 日之后收到的发表文章。随机选取了 240 篇文章，并将其输入三种流行的人工智能检测器。每个期刊随机抽取 ChatGPT 发布前发表的 20 篇文章作为阴性对照文章。通过使用 ChatGPT 更改阴性对照文章中 25%、50% 和 100% 的文本，创建了 36 篇阳性对照文章（每种期刊 6 篇），然后用于验证每种检测器。使用独立 t 检验比较了 ChatGPT 发布前和 ChatGPT 发布后的文章，每个检测器检测到的人工智能所写文本的平均百分比。使用每个期刊、文章类型（即队列、临床试验、综述）和投稿月份的人工智能生成文本百分比进行多变量回归分析：一种人工智能检测器能持续准确地识别出阳性对照文章中人工智能生成的文本，而另外两种检测器的灵敏度和特异性较差。最准确的检测器显示，在 ChatGPT 发布后收到的文章中，人工智能检测到的百分比略有增加（+1.8%，P = 0.01）。回归分析表明，人工智能生成文本的可能性与期刊、文章类型或投稿月份之间没有一致的联系：本研究发现了人工智能对骨科文献的早期影响，尽管影响不大，但适当的监督将在保持研究的完整性和准确性方面发挥关键作用。人工智能检测器可能会在监管工作中发挥关键作用，尽管它们还需要进一步开发和标准化来解释其结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Academy of Orthopaedic Surgeons 医学-整形外科

CiteScore

6.10

自引率

6.20%

发文量

529

审稿时长

4-8 weeks

期刊介绍： The Journal of the American Academy of Orthopaedic Surgeons was established in the fall of 1993 by the Academy in response to its membership’s demand for a clinical review journal. Two issues were published the first year, followed by six issues yearly from 1994 through 2004. In September 2005, JAAOS began publishing monthly issues. Each issue includes richly illustrated peer-reviewed articles focused on clinical diagnosis and management. Special features in each issue provide commentary on developments in pharmacotherapeutics, materials and techniques, and computer applications.