Gerard A Sheridan, Lisa C Howard, Michael E Neufeld, Tom R Doyle, Andrew J Hughes, Peter K Sculco, David E Beverland, Donald S Garbuz, Bassam A Masri
{"title":"Can artificial intelligence generate scientific discussion that passes peer review for publication in a high-impact orthopaedic journal?","authors":"Gerard A Sheridan, Lisa C Howard, Michael E Neufeld, Tom R Doyle, Andrew J Hughes, Peter K Sculco, David E Beverland, Donald S Garbuz, Bassam A Masri","doi":"10.1007/s11845-025-03971-y","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>There is huge interest in the use of artificial intelligence (AI) in the production and assessment of academic material; however, the role of AI remains unclear.</p><p><strong>Aim: </strong>The purpose of this study was to perform a reviewer-blinded assessment of the quality of scientific discussion generated by an advanced AI language model (ChatGPT-4, Open AI) and determine whether this could be recommended for high-impact journal publication.</p><p><strong>Methods: </strong>The introduction, methods and results sections of a recently published article from a high-impact journal were input into a current AI model. The AI application then produced a discussion and conclusion based on the provided text using a standardized prompt. Six experienced blinded reviewers scored all five sections of the hybrid article. A one-way analysis of variance (ANOVA) was used to assess significant differences between scores of each section. Reviewers recommended a decision regarding the suitability of the article for publication.</p><p><strong>Results: </strong>AI composed a scientific discussion and conclusion. The median score was 80 (IQR 70-90) for introduction, 77.5 (IQR 70-90) for methods, 82.5 (IQR 50-90) for results, 60 (IQR 40-75) for discussion and 60 (IQR 40-80) for the conclusion. The median scores for the AI-generated sections were non-significantly lower than other sections (p = 0.37). The majority of reviewers (5/6, 83%) recommended \"acceptance for publication after major revision\". One reviewer recommended \"resubmission with no guarantee of acceptance\". There were no recommendations for rejection.</p><p><strong>Conclusion: </strong>Current AI large language models are now capable of generating content that passes experienced peer review and is acceptable for publication in a high-impact orthopaedic journal, after revision. There are still many concerns regarding the integration of AI into the process of scientific writing, mainly the tendency of AI to rely on advanced pattern recognition and fabricated or inadequate references.</p><p><strong>Level of evidence: </strong>Level IV.</p>","PeriodicalId":14507,"journal":{"name":"Irish Journal of Medical Science","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Irish Journal of Medical Science","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s11845-025-03971-y","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: There is huge interest in the use of artificial intelligence (AI) in the production and assessment of academic material; however, the role of AI remains unclear.
Aim: The purpose of this study was to perform a reviewer-blinded assessment of the quality of scientific discussion generated by an advanced AI language model (ChatGPT-4, Open AI) and determine whether this could be recommended for high-impact journal publication.
Methods: The introduction, methods and results sections of a recently published article from a high-impact journal were input into a current AI model. The AI application then produced a discussion and conclusion based on the provided text using a standardized prompt. Six experienced blinded reviewers scored all five sections of the hybrid article. A one-way analysis of variance (ANOVA) was used to assess significant differences between scores of each section. Reviewers recommended a decision regarding the suitability of the article for publication.
Results: AI composed a scientific discussion and conclusion. The median score was 80 (IQR 70-90) for introduction, 77.5 (IQR 70-90) for methods, 82.5 (IQR 50-90) for results, 60 (IQR 40-75) for discussion and 60 (IQR 40-80) for the conclusion. The median scores for the AI-generated sections were non-significantly lower than other sections (p = 0.37). The majority of reviewers (5/6, 83%) recommended "acceptance for publication after major revision". One reviewer recommended "resubmission with no guarantee of acceptance". There were no recommendations for rejection.
Conclusion: Current AI large language models are now capable of generating content that passes experienced peer review and is acceptable for publication in a high-impact orthopaedic journal, after revision. There are still many concerns regarding the integration of AI into the process of scientific writing, mainly the tendency of AI to rely on advanced pattern recognition and fabricated or inadequate references.
期刊介绍:
The Irish Journal of Medical Science is the official organ of the Royal Academy of Medicine in Ireland. Established in 1832, this quarterly journal is a contribution to medical science and an ideal forum for the younger medical/scientific professional to enter world literature and an ideal launching platform now, as in the past, for many a young research worker.
The primary role of both the Academy and IJMS is that of providing a forum for the exchange of scientific information and to promote academic discussion, so essential to scientific progress.