Can artificial intelligence generate scientific discussion that passes peer review for publication in a high-impact orthopaedic journal?

IF 1.7 4区 医学 Q2 MEDICINE, GENERAL & INTERNAL
Gerard A Sheridan, Lisa C Howard, Michael E Neufeld, Tom R Doyle, Andrew J Hughes, Peter K Sculco, David E Beverland, Donald S Garbuz, Bassam A Masri
{"title":"Can artificial intelligence generate scientific discussion that passes peer review for publication in a high-impact orthopaedic journal?","authors":"Gerard A Sheridan, Lisa C Howard, Michael E Neufeld, Tom R Doyle, Andrew J Hughes, Peter K Sculco, David E Beverland, Donald S Garbuz, Bassam A Masri","doi":"10.1007/s11845-025-03971-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>There is huge interest in the use of artificial intelligence (AI) in the production and assessment of academic material; however, the role of AI remains unclear.</p><p><strong>Aim: </strong>The purpose of this study was to perform a reviewer-blinded assessment of the quality of scientific discussion generated by an advanced AI language model (ChatGPT-4, Open AI) and determine whether this could be recommended for high-impact journal publication.</p><p><strong>Methods: </strong>The introduction, methods and results sections of a recently published article from a high-impact journal were input into a current AI model. The AI application then produced a discussion and conclusion based on the provided text using a standardized prompt. Six experienced blinded reviewers scored all five sections of the hybrid article. A one-way analysis of variance (ANOVA) was used to assess significant differences between scores of each section. Reviewers recommended a decision regarding the suitability of the article for publication.</p><p><strong>Results: </strong>AI composed a scientific discussion and conclusion. The median score was 80 (IQR 70-90) for introduction, 77.5 (IQR 70-90) for methods, 82.5 (IQR 50-90) for results, 60 (IQR 40-75) for discussion and 60 (IQR 40-80) for the conclusion. The median scores for the AI-generated sections were non-significantly lower than other sections (p = 0.37). The majority of reviewers (5/6, 83%) recommended \"acceptance for publication after major revision\". One reviewer recommended \"resubmission with no guarantee of acceptance\". There were no recommendations for rejection.</p><p><strong>Conclusion: </strong>Current AI large language models are now capable of generating content that passes experienced peer review and is acceptable for publication in a high-impact orthopaedic journal, after revision. There are still many concerns regarding the integration of AI into the process of scientific writing, mainly the tendency of AI to rely on advanced pattern recognition and fabricated or inadequate references.</p><p><strong>Level of evidence: </strong>Level IV.</p>","PeriodicalId":14507,"journal":{"name":"Irish Journal of Medical Science","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Irish Journal of Medical Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11845-025-03971-y","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Background: There is huge interest in the use of artificial intelligence (AI) in the production and assessment of academic material; however, the role of AI remains unclear.

Aim: The purpose of this study was to perform a reviewer-blinded assessment of the quality of scientific discussion generated by an advanced AI language model (ChatGPT-4, Open AI) and determine whether this could be recommended for high-impact journal publication.

Methods: The introduction, methods and results sections of a recently published article from a high-impact journal were input into a current AI model. The AI application then produced a discussion and conclusion based on the provided text using a standardized prompt. Six experienced blinded reviewers scored all five sections of the hybrid article. A one-way analysis of variance (ANOVA) was used to assess significant differences between scores of each section. Reviewers recommended a decision regarding the suitability of the article for publication.

Results: AI composed a scientific discussion and conclusion. The median score was 80 (IQR 70-90) for introduction, 77.5 (IQR 70-90) for methods, 82.5 (IQR 50-90) for results, 60 (IQR 40-75) for discussion and 60 (IQR 40-80) for the conclusion. The median scores for the AI-generated sections were non-significantly lower than other sections (p = 0.37). The majority of reviewers (5/6, 83%) recommended "acceptance for publication after major revision". One reviewer recommended "resubmission with no guarantee of acceptance". There were no recommendations for rejection.

Conclusion: Current AI large language models are now capable of generating content that passes experienced peer review and is acceptable for publication in a high-impact orthopaedic journal, after revision. There are still many concerns regarding the integration of AI into the process of scientific writing, mainly the tendency of AI to rely on advanced pattern recognition and fabricated or inadequate references.

Level of evidence: Level IV.

人工智能能否产生通过同行评审并在高影响力骨科杂志上发表的科学讨论?
背景:在学术材料的制作和评估中使用人工智能(AI)有着巨大的兴趣;然而,人工智能的作用仍不明朗。目的:本研究的目的是对先进的人工智能语言模型(ChatGPT-4, Open AI)产生的科学讨论的质量进行审稿人盲法评估,并确定该模型是否可以推荐用于高影响力的期刊发表。方法:将某高影响力期刊最近发表的一篇文章的引言、方法和结果部分输入到当前的AI模型中。然后,人工智能应用程序使用标准化提示,根据提供的文本进行讨论并得出结论。六位经验丰富的盲法审稿人对这篇混合文章的所有五个部分进行了评分。采用单因素方差分析(ANOVA)评估各部分得分之间的显著性差异。审稿人建议对文章是否适合发表作出决定。结果:人工智能组成了一个科学的讨论和结论。中位分:引言80分(IQR 70-90),方法77.5分(IQR 70-90),结果82.5分(IQR 50-90),讨论60分(IQR 40-75),结论60分(IQR 40-80)。人工智能生成部分的中位数得分低于其他部分(p = 0.37)。大多数审稿人(5/ 6,83%)建议“重大修改后接受发表”。一位审稿人建议“重新提交,但不保证被接受”。没有关于拒绝的建议。结论:目前的人工智能大型语言模型现在能够生成通过经验丰富的同行评审的内容,并且经过修订后可以在高影响力的骨科期刊上发表。关于将人工智能整合到科学写作过程中,仍然存在许多担忧,主要是人工智能倾向于依赖高级模式识别和捏造或不充分的参考文献。证据等级:四级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Irish Journal of Medical Science
Irish Journal of Medical Science 医学-医学:内科
CiteScore
3.70
自引率
4.80%
发文量
357
审稿时长
4-8 weeks
期刊介绍: The Irish Journal of Medical Science is the official organ of the Royal Academy of Medicine in Ireland. Established in 1832, this quarterly journal is a contribution to medical science and an ideal forum for the younger medical/scientific professional to enter world literature and an ideal launching platform now, as in the past, for many a young research worker. The primary role of both the Academy and IJMS is that of providing a forum for the exchange of scientific information and to promote academic discussion, so essential to scientific progress.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信