人工智能与学术界:关于人工智能文本检测器在行为健康学术写作中准确性的实验研究。

IF 4 1区 哲学 Q1 MEDICAL ETHICS
Andrey A Popkov, Tyson S Barrett
{"title":"人工智能与学术界:关于人工智能文本检测器在行为健康学术写作中准确性的实验研究。","authors":"Andrey A Popkov, Tyson S Barrett","doi":"10.1080/08989621.2024.2331757","DOIUrl":null,"url":null,"abstract":"<p><p>Artificial Intelligence (AI) language models continue to expand in both access and capability. As these models have evolved, the number of academic journals in medicine and healthcare which have explored policies regarding AI-generated text has increased. The implementation of such policies requires accurate AI detection tools. Inaccurate detectors risk unnecessary penalties for human authors and/or may compromise the effective enforcement of guidelines against AI-generated content. Yet, the accuracy of AI text detection tools in identifying human-written versus AI-generated content has been found to vary across published studies. This experimental study used a sample of behavioral health publications and found problematic false positive and false negative rates from both free and paid AI detection tools. The study assessed 100 research articles from 2016-2018 in behavioral health and psychiatry journals and 200 texts produced by AI chatbots (100 by \"ChatGPT\" and 100 by \"Claude\"). The free AI detector showed a median of 27.2% for the proportion of academic text identified as AI-generated, while commercial software Originality.AI demonstrated better performance but still had limitations, especially in detecting texts generated by Claude. These error rates raise doubts about relying on AI detectors to enforce strict policies around AI text generation in behavioral health publications.</p>","PeriodicalId":50927,"journal":{"name":"Accountability in Research-Policies and Quality Assurance","volume":" ","pages":"1072-1088"},"PeriodicalIF":4.0000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AI vs academia: Experimental study on AI text detectors' accuracy in behavioral health academic writing.\",\"authors\":\"Andrey A Popkov, Tyson S Barrett\",\"doi\":\"10.1080/08989621.2024.2331757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Artificial Intelligence (AI) language models continue to expand in both access and capability. As these models have evolved, the number of academic journals in medicine and healthcare which have explored policies regarding AI-generated text has increased. The implementation of such policies requires accurate AI detection tools. Inaccurate detectors risk unnecessary penalties for human authors and/or may compromise the effective enforcement of guidelines against AI-generated content. Yet, the accuracy of AI text detection tools in identifying human-written versus AI-generated content has been found to vary across published studies. This experimental study used a sample of behavioral health publications and found problematic false positive and false negative rates from both free and paid AI detection tools. The study assessed 100 research articles from 2016-2018 in behavioral health and psychiatry journals and 200 texts produced by AI chatbots (100 by \\\"ChatGPT\\\" and 100 by \\\"Claude\\\"). The free AI detector showed a median of 27.2% for the proportion of academic text identified as AI-generated, while commercial software Originality.AI demonstrated better performance but still had limitations, especially in detecting texts generated by Claude. These error rates raise doubts about relying on AI detectors to enforce strict policies around AI text generation in behavioral health publications.</p>\",\"PeriodicalId\":50927,\"journal\":{\"name\":\"Accountability in Research-Policies and Quality Assurance\",\"volume\":\" \",\"pages\":\"1072-1088\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Accountability in Research-Policies and Quality Assurance\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1080/08989621.2024.2331757\",\"RegionNum\":1,\"RegionCategory\":\"哲学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/3/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICAL ETHICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accountability in Research-Policies and Quality Assurance","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/08989621.2024.2331757","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MEDICAL ETHICS","Score":null,"Total":0}
引用次数: 0

摘要

人工智能(AI)语言模型的使用范围和能力都在不断扩大。随着这些模型的发展,探讨人工智能生成文本相关政策的医学和医疗保健学术期刊数量也在增加。这些政策的实施需要准确的人工智能检测工具。不准确的检测工具可能会对人类作者造成不必要的惩罚,和/或影响针对人工智能生成内容的指导方针的有效执行。然而,人工智能文本检测工具在识别人类撰写内容和人工智能生成内容方面的准确性在已发表的研究中存在差异。这项实验研究使用了行为健康出版物样本,发现免费和付费人工智能检测工具的误判率和误判率都存在问题。研究评估了 2016-2018 年行为健康和精神病学期刊中的 100 篇研究文章,以及人工智能聊天机器人生成的 200 篇文本(100 篇由 "ChatGPT "生成,100 篇由 "Claude "生成)。免费的人工智能检测器显示,被识别为人工智能生成的学术文本比例的中位数为27.2%,而商业软件Originality.AI表现较好,但仍有局限性,尤其是在检测克劳德生成的文本方面。这些错误率让人怀疑是否应该依靠人工智能检测器在行为健康出版物中执行严格的人工智能文本生成政策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
AI vs academia: Experimental study on AI text detectors' accuracy in behavioral health academic writing.

Artificial Intelligence (AI) language models continue to expand in both access and capability. As these models have evolved, the number of academic journals in medicine and healthcare which have explored policies regarding AI-generated text has increased. The implementation of such policies requires accurate AI detection tools. Inaccurate detectors risk unnecessary penalties for human authors and/or may compromise the effective enforcement of guidelines against AI-generated content. Yet, the accuracy of AI text detection tools in identifying human-written versus AI-generated content has been found to vary across published studies. This experimental study used a sample of behavioral health publications and found problematic false positive and false negative rates from both free and paid AI detection tools. The study assessed 100 research articles from 2016-2018 in behavioral health and psychiatry journals and 200 texts produced by AI chatbots (100 by "ChatGPT" and 100 by "Claude"). The free AI detector showed a median of 27.2% for the proportion of academic text identified as AI-generated, while commercial software Originality.AI demonstrated better performance but still had limitations, especially in detecting texts generated by Claude. These error rates raise doubts about relying on AI detectors to enforce strict policies around AI text generation in behavioral health publications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
4.90
自引率
14.70%
发文量
49
审稿时长
>12 weeks
期刊介绍: Accountability in Research: Policies and Quality Assurance is devoted to the examination and critical analysis of systems for maximizing integrity in the conduct of research. It provides an interdisciplinary, international forum for the development of ethics, procedures, standards policies, and concepts to encourage the ethical conduct of research and to enhance the validity of research results. The journal welcomes views on advancing the integrity of research in the fields of general and multidisciplinary sciences, medicine, law, economics, statistics, management studies, public policy, politics, sociology, history, psychology, philosophy, ethics, and information science. All submitted manuscripts are subject to initial appraisal by the Editor, and if found suitable for further consideration, to peer review by independent, anonymous expert referees.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信