From Algorithms to Academia: An Endeavor to Benchmark AI-Generated Scientific Papers against Human Standards.

IF 1.2 Q3 ORTHOPEDICS
Jackson Woodrow, Nour Nassour, John Y Kwon, Soheil Ashkani-Esfahani, Mitchel Harris
{"title":"From Algorithms to Academia: An Endeavor to Benchmark AI-Generated Scientific Papers against Human Standards.","authors":"Jackson Woodrow, Nour Nassour, John Y Kwon, Soheil Ashkani-Esfahani, Mitchel Harris","doi":"10.22038/ABJS.2024.80093.3669","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>The aim of this study is to quantitatively investigate the accuracy of text generated by AI large language models while comparing their readability and likelihood of being accepted to a scientific compared to human-authored papers on the same topics.</p><p><strong>Methods: </strong>The study consisted of two papers written by ChatGPT, two papers written by Assistant by scite, and two papers written by humans. A total of six independent reviewers were blinded to the authorship of each paper and assigned a grade to each subsection on a scale of 1 to 4. Additionally, each reviewer was asked to guess if the paper was written by a human or AI and explain their reasoning. The study authors also graded each AI-generated paper based on factual accuracy of the claims and citations.</p><p><strong>Results: </strong>The human-written calcaneus fracture paper received the highest score of a 3.70/4, followed by Assistant-written calcaneus fracture paper (3.02/4), human-written ankle osteoarthritis paper (2.98/4), ChatGPT calcaneus fracture (2.89/4), ChatGPT Ankle Osteoarthritis (2.87/4), and Assistant Ankle Osteoarthritis (2.78/4). The human calcaneus fracture paper received a statistically significant higher rating than the ChatGPT calcaneus fracture paper (P = 0.028) and the Assistant calcaneus fracture paper (P = 0.043). The ChatGPT osteoarthritis review showed 100% factual accuracy, the ChatGPT calcaneus fracture review was 97.46% factually accurate, the Assistant calcaneus fracture was 95.56% accurate, and the Assistant ankle osteoarthritis was 94.98% accurate. Regarding citations, the ChatGPT ankle osteoarthritis paper was 90% accurate, the ChatGPT calcaneus fracture was 69.23% accurate, the Assistant ankle osteoarthritis was 35.14% accurate, and the Assistant calcaneus fracture was 39.68% accurate.</p><p><strong>Conclusion: </strong>Through this paper we emphasize that while AI holds the promise of enhancing knowledge sharing, it must be used responsibly and in conjunction with comprehensive fact-checking procedures to maintain the integrity of the scientific discourse.</p>","PeriodicalId":46704,"journal":{"name":"Archives of Bone and Joint Surgery-ABJS","volume":"13 4","pages":"212-222"},"PeriodicalIF":1.2000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12050080/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Archives of Bone and Joint Surgery-ABJS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22038/ABJS.2024.80093.3669","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ORTHOPEDICS","Score":null,"Total":0}
引用次数: 0

Abstract

Objectives: The aim of this study is to quantitatively investigate the accuracy of text generated by AI large language models while comparing their readability and likelihood of being accepted to a scientific compared to human-authored papers on the same topics.

Methods: The study consisted of two papers written by ChatGPT, two papers written by Assistant by scite, and two papers written by humans. A total of six independent reviewers were blinded to the authorship of each paper and assigned a grade to each subsection on a scale of 1 to 4. Additionally, each reviewer was asked to guess if the paper was written by a human or AI and explain their reasoning. The study authors also graded each AI-generated paper based on factual accuracy of the claims and citations.

Results: The human-written calcaneus fracture paper received the highest score of a 3.70/4, followed by Assistant-written calcaneus fracture paper (3.02/4), human-written ankle osteoarthritis paper (2.98/4), ChatGPT calcaneus fracture (2.89/4), ChatGPT Ankle Osteoarthritis (2.87/4), and Assistant Ankle Osteoarthritis (2.78/4). The human calcaneus fracture paper received a statistically significant higher rating than the ChatGPT calcaneus fracture paper (P = 0.028) and the Assistant calcaneus fracture paper (P = 0.043). The ChatGPT osteoarthritis review showed 100% factual accuracy, the ChatGPT calcaneus fracture review was 97.46% factually accurate, the Assistant calcaneus fracture was 95.56% accurate, and the Assistant ankle osteoarthritis was 94.98% accurate. Regarding citations, the ChatGPT ankle osteoarthritis paper was 90% accurate, the ChatGPT calcaneus fracture was 69.23% accurate, the Assistant ankle osteoarthritis was 35.14% accurate, and the Assistant calcaneus fracture was 39.68% accurate.

Conclusion: Through this paper we emphasize that while AI holds the promise of enhancing knowledge sharing, it must be used responsibly and in conjunction with comprehensive fact-checking procedures to maintain the integrity of the scientific discourse.

从算法到学术界:将人工智能生成的科学论文与人类标准进行比较。
目的:本研究的目的是定量调查人工智能大型语言模型生成的文本的准确性,同时比较它们的可读性和被科学论文接受的可能性,并将其与人类撰写的论文进行比较。方法:由ChatGPT撰写两篇论文,Assistant通过sciite撰写两篇论文,人工撰写两篇论文。共有六名独立的审稿人对每篇论文的作者进行了盲检,并对每个部分按1到4的等级进行了评分。此外,每位审稿人都被要求猜测论文是由人类还是人工智能撰写的,并解释他们的推理。该研究的作者还根据声明和引用的事实准确性对每篇人工智能生成的论文进行了评分。结果:人写跟骨骨折论文得分最高,为3.70/4,其次是辅助写跟骨骨折论文(3.02/4)、踝关节骨关节炎论文(2.98/4)、ChatGPT跟骨骨折论文(2.89/4)、ChatGPT踝关节骨关节炎论文(2.87/4)、辅助写踝关节骨关节炎论文(2.78/4)。人跟骨骨折纸的评分高于ChatGPT跟骨骨折纸(P = 0.028)和Assistant跟骨骨折纸(P = 0.043),具有统计学意义。ChatGPT骨关节炎回顾的事实准确性为100%,ChatGPT跟骨骨折回顾的事实准确性为97.46%,助理跟骨骨折回顾的事实准确性为95.56%,助理踝关节骨关节炎回顾的事实准确性为94.98%。在引文方面,ChatGPT踝骨关节炎论文的准确率为90%,ChatGPT跟骨骨折论文的准确率为69.23%,Assistant踝骨关节炎论文的准确率为35.14%,Assistant跟骨骨折论文的准确率为39.68%。结论:通过本文,我们强调,虽然人工智能有望加强知识共享,但必须负责任地使用它,并与全面的事实核查程序相结合,以保持科学话语的完整性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.30
自引率
0.00%
发文量
128
期刊介绍: The Archives of Bone and Joint Surgery (ABJS) aims to encourage a better understanding of all aspects of Orthopedic Sciences. The journal accepts scientific papers including original research, review article, short communication, case report, and letter to the editor in all fields of bone, joint, musculoskeletal surgery and related researches. The Archives of Bone and Joint Surgery (ABJS) will publish papers in all aspects of today`s modern orthopedic sciences including: Arthroscopy, Arthroplasty, Sport Medicine, Reconstruction, Hand and Upper Extremity, Pediatric Orthopedics, Spine, Trauma, Foot and Ankle, Tumor, Joint Rheumatic Disease, Skeletal Imaging, Orthopedic Physical Therapy, Rehabilitation, Orthopedic Basic Sciences (Biomechanics, Biotechnology, Biomaterial..).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信