人工智能生成的放射学社论:专家编辑能发现它们吗?

IF 3.1 3区 医学 Q2 CLINICAL NEUROLOGY
Burak Berksu Ozkara,Alexandre Boutet,Bryan A Comstock,Johan Van Goethem,Thierry A G M Huisman,Jeffrey S Ross,Luca Saba,Lubdha M Shah,Max Wintermark,Mauricio Castillo
{"title":"人工智能生成的放射学社论:专家编辑能发现它们吗?","authors":"Burak Berksu Ozkara,Alexandre Boutet,Bryan A Comstock,Johan Van Goethem,Thierry A G M Huisman,Jeffrey S Ross,Luca Saba,Lubdha M Shah,Max Wintermark,Mauricio Castillo","doi":"10.3174/ajnr.a8505","DOIUrl":null,"url":null,"abstract":"BACKGROUND AND PURPOSE\r\nWe aimed to evaluate GPT-4's ability to write radiology editorials and to compare these with human-written counterparts, thereby determining their real-world applicability for scientific writing.\r\n\r\nMATERIALS AND METHODS\r\nSixteen editorials from eight journals were included. To generate the AI-written editorials, the summary of 16 human-written editorials was fed into GPT-4. Six experienced editors reviewed the articles. First, an unpaired approach was used. The raters were asked to evaluate the content of each article using a 1-5 Likert scale across specified metrics. Then, they determined whether the editorials were written by humans or AI. The articles were then evaluated in pairs to determine which article was generated by AI and which should be published. Finally, the articles were analyzed with an AI detector and for plagiarism.\r\n\r\nRESULTS\r\nThe human-written articles had a median AI probability score of 2.0%, whereas the AI-written articles had 58%. The median similarity score among AI-written articles was 3%. 58% of unpaired articles were correctly classified regarding authorship. Rating accuracy was increased to 70% in the paired setting. AI-written articles received slightly higher scores in most metrics. When stratified by perception, human-written perceived articles were rated higher in most categories. In the paired setting, raters strongly preferred publishing the article they perceived as human-written (82%).\r\n\r\nCONCLUSIONS\r\nGPT-4 can write high-quality articles that iThenticate does not flag as plagiarized, which may go undetected by editors, and that detection tools can detect to a limited extent. Editors showed a positive bias toward human-written articles.\r\n\r\nABBREVIATIONS\r\nAI = Artificial intelligence; LLM = large language model; SD = standard deviation.","PeriodicalId":7875,"journal":{"name":"American Journal of Neuroradiology","volume":"65 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence-Generated Editorials in Radiology: Can Expert Editors Detect Them?\",\"authors\":\"Burak Berksu Ozkara,Alexandre Boutet,Bryan A Comstock,Johan Van Goethem,Thierry A G M Huisman,Jeffrey S Ross,Luca Saba,Lubdha M Shah,Max Wintermark,Mauricio Castillo\",\"doi\":\"10.3174/ajnr.a8505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"BACKGROUND AND PURPOSE\\r\\nWe aimed to evaluate GPT-4's ability to write radiology editorials and to compare these with human-written counterparts, thereby determining their real-world applicability for scientific writing.\\r\\n\\r\\nMATERIALS AND METHODS\\r\\nSixteen editorials from eight journals were included. To generate the AI-written editorials, the summary of 16 human-written editorials was fed into GPT-4. Six experienced editors reviewed the articles. First, an unpaired approach was used. The raters were asked to evaluate the content of each article using a 1-5 Likert scale across specified metrics. Then, they determined whether the editorials were written by humans or AI. The articles were then evaluated in pairs to determine which article was generated by AI and which should be published. Finally, the articles were analyzed with an AI detector and for plagiarism.\\r\\n\\r\\nRESULTS\\r\\nThe human-written articles had a median AI probability score of 2.0%, whereas the AI-written articles had 58%. The median similarity score among AI-written articles was 3%. 58% of unpaired articles were correctly classified regarding authorship. Rating accuracy was increased to 70% in the paired setting. AI-written articles received slightly higher scores in most metrics. When stratified by perception, human-written perceived articles were rated higher in most categories. In the paired setting, raters strongly preferred publishing the article they perceived as human-written (82%).\\r\\n\\r\\nCONCLUSIONS\\r\\nGPT-4 can write high-quality articles that iThenticate does not flag as plagiarized, which may go undetected by editors, and that detection tools can detect to a limited extent. Editors showed a positive bias toward human-written articles.\\r\\n\\r\\nABBREVIATIONS\\r\\nAI = Artificial intelligence; LLM = large language model; SD = standard deviation.\",\"PeriodicalId\":7875,\"journal\":{\"name\":\"American Journal of Neuroradiology\",\"volume\":\"65 1\",\"pages\":\"\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Neuroradiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3174/ajnr.a8505\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CLINICAL NEUROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Neuroradiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3174/ajnr.a8505","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CLINICAL NEUROLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景和目的我们旨在评估 GPT-4 撰写放射学社论的能力,并将其与人类撰写的社论进行比较,从而确定其在科学写作中的实际适用性。为了生成人工智能撰写的社论,将 16 篇人类撰写的社论摘要输入 GPT-4。六位经验丰富的编辑对文章进行了审阅。首先,采用非配对方法。要求评阅者使用 1-5 级李克特量表对每篇文章的内容进行评估,并给出具体的指标。然后,他们确定社论是由人类还是人工智能撰写的。然后对文章进行成对评估,以确定哪篇文章是由人工智能生成的,哪篇文章应该发表。结果人类撰写的文章的人工智能概率得分中位数为 2.0%,而人工智能撰写的文章的人工智能概率得分中位数为 58%。人工智能撰写文章的相似度中位数为 3%。在未配对的文章中,有 58% 的文章被正确归类为作者。在配对情况下,准确率提高到了 70%。在大多数指标中,人工智能撰写的文章得分略高。根据感知进行分层时,人工智能撰写的文章在大多数类别中得分更高。结论GPT-4 可以撰写高质量的文章,iThenticate 不会将其标记为抄袭,编辑可能无法发现,而检测工具也只能在有限的范围内发现。编辑对人工撰写的文章表现出积极的倾向性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Artificial Intelligence-Generated Editorials in Radiology: Can Expert Editors Detect Them?
BACKGROUND AND PURPOSE We aimed to evaluate GPT-4's ability to write radiology editorials and to compare these with human-written counterparts, thereby determining their real-world applicability for scientific writing. MATERIALS AND METHODS Sixteen editorials from eight journals were included. To generate the AI-written editorials, the summary of 16 human-written editorials was fed into GPT-4. Six experienced editors reviewed the articles. First, an unpaired approach was used. The raters were asked to evaluate the content of each article using a 1-5 Likert scale across specified metrics. Then, they determined whether the editorials were written by humans or AI. The articles were then evaluated in pairs to determine which article was generated by AI and which should be published. Finally, the articles were analyzed with an AI detector and for plagiarism. RESULTS The human-written articles had a median AI probability score of 2.0%, whereas the AI-written articles had 58%. The median similarity score among AI-written articles was 3%. 58% of unpaired articles were correctly classified regarding authorship. Rating accuracy was increased to 70% in the paired setting. AI-written articles received slightly higher scores in most metrics. When stratified by perception, human-written perceived articles were rated higher in most categories. In the paired setting, raters strongly preferred publishing the article they perceived as human-written (82%). CONCLUSIONS GPT-4 can write high-quality articles that iThenticate does not flag as plagiarized, which may go undetected by editors, and that detection tools can detect to a limited extent. Editors showed a positive bias toward human-written articles. ABBREVIATIONS AI = Artificial intelligence; LLM = large language model; SD = standard deviation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.10
自引率
5.70%
发文量
506
审稿时长
2 months
期刊介绍: The mission of AJNR is to further knowledge in all aspects of neuroimaging, head and neck imaging, and spine imaging for neuroradiologists, radiologists, trainees, scientists, and associated professionals through print and/or electronic publication of quality peer-reviewed articles that lead to the highest standards in patient care, research, and education and to promote discussion of these and other issues through its electronic activities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信