评估困惑分数在区分人工智能生成和人类撰写摘要方面的功效。

IF 3.8 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Alperen Elek , Hatice Sude Yildiz , Benan Akca , Nisa Cem Oren , Batuhan Gundogdu
{"title":"评估困惑分数在区分人工智能生成和人类撰写摘要方面的功效。","authors":"Alperen Elek ,&nbsp;Hatice Sude Yildiz ,&nbsp;Benan Akca ,&nbsp;Nisa Cem Oren ,&nbsp;Batuhan Gundogdu","doi":"10.1016/j.acra.2025.01.017","DOIUrl":null,"url":null,"abstract":"<div><h3>Rationale and Objectives</h3><div>We aimed to evaluate the efficacy of perplexity scores in distinguishing between human-written and AI-generated radiology abstracts and to assess the relative performance of available AI detection tools in detecting AI-generated content.</div></div><div><h3>Methods</h3><div>Academic articles were curated from PubMed using the keywords \"neuroimaging\" and \"angiography.\" Filters included English-language, open-access articles with abstracts without subheadings, published before 2021, and within Chatbot processing word limits. The first 50 qualifying articles were selected, and their full texts were used to create AI-generated abstracts. Perplexity scores, which estimate sentence predictability, were calculated for both AI-generated and human-written abstracts. The performance of three AI tools in discriminating human-written from AI-generated abstracts was assessed.</div></div><div><h3>Results</h3><div>The selected 50 articles consist of 22 review articles (44%), 12 case or technical reports (24%), 15 research articles (30%), and one editorial (2%). The perplexity scores for human-written abstracts (median; 35.9 IQR; 25.11–51.8) were higher than those for AI-generated abstracts (median; 21.2 IQR; 16.87–28.38), (p<!--> <!-->=<!--> <!-->0.057) with an AUC<!--> <!-->=<!--> <!-->0.7794. One AI tool performed less than chance in identifying human-written from AI-generated abstracts with an accuracy of 36% (p<!--> <!-->&gt;<!--> <!-->0.05) while another tool yielded an accuracy of 95% with an AUC<!--> <!-->=<!--> <!-->0.8688.</div></div><div><h3>Conclusion</h3><div>This study underscores the potential of perplexity scores in detecting AI-generated and potentially fraudulent abstracts. However, more research is needed to further explore these findings and their implications for the use of AI in academic writing. Future studies could also investigate other metrics or methods for distinguishing between human-written and AI-generated texts.</div></div>","PeriodicalId":50928,"journal":{"name":"Academic Radiology","volume":"32 4","pages":"Pages 1785-1790"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluating the Efficacy of Perplexity Scores in Distinguishing AI-Generated and Human-Written Abstracts\",\"authors\":\"Alperen Elek ,&nbsp;Hatice Sude Yildiz ,&nbsp;Benan Akca ,&nbsp;Nisa Cem Oren ,&nbsp;Batuhan Gundogdu\",\"doi\":\"10.1016/j.acra.2025.01.017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Rationale and Objectives</h3><div>We aimed to evaluate the efficacy of perplexity scores in distinguishing between human-written and AI-generated radiology abstracts and to assess the relative performance of available AI detection tools in detecting AI-generated content.</div></div><div><h3>Methods</h3><div>Academic articles were curated from PubMed using the keywords \\\"neuroimaging\\\" and \\\"angiography.\\\" Filters included English-language, open-access articles with abstracts without subheadings, published before 2021, and within Chatbot processing word limits. The first 50 qualifying articles were selected, and their full texts were used to create AI-generated abstracts. Perplexity scores, which estimate sentence predictability, were calculated for both AI-generated and human-written abstracts. The performance of three AI tools in discriminating human-written from AI-generated abstracts was assessed.</div></div><div><h3>Results</h3><div>The selected 50 articles consist of 22 review articles (44%), 12 case or technical reports (24%), 15 research articles (30%), and one editorial (2%). The perplexity scores for human-written abstracts (median; 35.9 IQR; 25.11–51.8) were higher than those for AI-generated abstracts (median; 21.2 IQR; 16.87–28.38), (p<!--> <!-->=<!--> <!-->0.057) with an AUC<!--> <!-->=<!--> <!-->0.7794. One AI tool performed less than chance in identifying human-written from AI-generated abstracts with an accuracy of 36% (p<!--> <!-->&gt;<!--> <!-->0.05) while another tool yielded an accuracy of 95% with an AUC<!--> <!-->=<!--> <!-->0.8688.</div></div><div><h3>Conclusion</h3><div>This study underscores the potential of perplexity scores in detecting AI-generated and potentially fraudulent abstracts. However, more research is needed to further explore these findings and their implications for the use of AI in academic writing. Future studies could also investigate other metrics or methods for distinguishing between human-written and AI-generated texts.</div></div>\",\"PeriodicalId\":50928,\"journal\":{\"name\":\"Academic Radiology\",\"volume\":\"32 4\",\"pages\":\"Pages 1785-1790\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Academic Radiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1076633225000170\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Radiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1076633225000170","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

基本原理和目标:我们旨在评估困惑评分在区分人类撰写和人工智能生成的放射学摘要方面的功效,并评估可用的人工智能检测工具在检测人工智能生成内容方面的相对性能。方法:使用关键词“神经成像”和“血管造影”从PubMed中筛选学术文章。过滤器包括英文、开放获取的文章,摘要没有小标题,发表于2021年之前,并且在聊天机器人处理字数限制内。前50篇符合条件的文章被选中,它们的全文被用来创建人工智能生成的摘要。估计句子可预测性的困惑分数是为人工智能生成的和人类编写的摘要计算的。评估了三种人工智能工具在区分人类撰写的摘要和人工智能生成的摘要方面的性能。结果:入选的50篇文章包括22篇综述(44%),12篇病例或技术报告(24%),15篇研究文章(30%),1篇社论(2%)。人类撰写摘要的困惑分数(中位数;35.9差;25.11-51.8)高于人工智能生成摘要(中位数;21.2差;16.87-28.38), (p=0.057), AUC=0.7794。一种人工智能工具在从人工智能生成的摘要中识别人类撰写的摘要的准确率为36% (p>0.05),而另一种工具的准确率为95%,AUC=0.8688。结论:本研究强调了困惑度评分在检测人工智能生成和潜在欺诈摘要方面的潜力。然而,需要更多的研究来进一步探索这些发现及其在学术写作中使用人工智能的含义。未来的研究还可以研究其他指标或方法来区分人类编写的文本和人工智能生成的文本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluating the Efficacy of Perplexity Scores in Distinguishing AI-Generated and Human-Written Abstracts

Rationale and Objectives

We aimed to evaluate the efficacy of perplexity scores in distinguishing between human-written and AI-generated radiology abstracts and to assess the relative performance of available AI detection tools in detecting AI-generated content.

Methods

Academic articles were curated from PubMed using the keywords "neuroimaging" and "angiography." Filters included English-language, open-access articles with abstracts without subheadings, published before 2021, and within Chatbot processing word limits. The first 50 qualifying articles were selected, and their full texts were used to create AI-generated abstracts. Perplexity scores, which estimate sentence predictability, were calculated for both AI-generated and human-written abstracts. The performance of three AI tools in discriminating human-written from AI-generated abstracts was assessed.

Results

The selected 50 articles consist of 22 review articles (44%), 12 case or technical reports (24%), 15 research articles (30%), and one editorial (2%). The perplexity scores for human-written abstracts (median; 35.9 IQR; 25.11–51.8) were higher than those for AI-generated abstracts (median; 21.2 IQR; 16.87–28.38), (p = 0.057) with an AUC = 0.7794. One AI tool performed less than chance in identifying human-written from AI-generated abstracts with an accuracy of 36% (p > 0.05) while another tool yielded an accuracy of 95% with an AUC = 0.8688.

Conclusion

This study underscores the potential of perplexity scores in detecting AI-generated and potentially fraudulent abstracts. However, more research is needed to further explore these findings and their implications for the use of AI in academic writing. Future studies could also investigate other metrics or methods for distinguishing between human-written and AI-generated texts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Academic Radiology
Academic Radiology 医学-核医学
CiteScore
7.60
自引率
10.40%
发文量
432
审稿时长
18 days
期刊介绍: Academic Radiology publishes original reports of clinical and laboratory investigations in diagnostic imaging, the diagnostic use of radioactive isotopes, computed tomography, positron emission tomography, magnetic resonance imaging, ultrasound, digital subtraction angiography, image-guided interventions and related techniques. It also includes brief technical reports describing original observations, techniques, and instrumental developments; state-of-the-art reports on clinical issues, new technology and other topics of current medical importance; meta-analyses; scientific studies and opinions on radiologic education; and letters to the Editor.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信