Do humans identify AI-generated text better than machines? Evidence based on excerpts from German theses☆

IF 1.3 4区 经济学 Q3 ECONOMICS
Alexandra Fiedler , Jörg Döpke
{"title":"Do humans identify AI-generated text better than machines? Evidence based on excerpts from German theses☆","authors":"Alexandra Fiedler ,&nbsp;Jörg Döpke","doi":"10.1016/j.iree.2025.100321","DOIUrl":null,"url":null,"abstract":"<div><div>We investigate whether human experts can identify AI-generated academic texts more accurately than current machine-based detectors. Conducted as a survey experiment at a German university of applied sciences, 63 lecturers in engineering, economics, and social sciences were asked to evaluate short excerpts (200–300 words) from both human-generated and AI-generated texts. These texts varied by discipline and writing level (student vs. professional) with the AI-generated content. The results show that both human evaluators and AI detectors correctly identified AI-generated texts only slightly better than chance, with humans achieving a recognition rate of 57 % for AI texts and 64 % for human-generated texts. There was no statistically significant difference between human and machine performance. Notably, professional-level AI texts were the most difficult to identify, with less than 20 % of respondents correctly classifying them. Regression analyses suggest that prior teaching experience slightly improves recognition accuracy, while subjective judgments of text quality were not influenced by actual or presumed authorship. These findings suggest that current written examination practices are increasingly vulnerable to undetected AI use. Both human judgment and existing AI detectors show high error rates, especially for high-quality AI-generated content. We conclude that a reconsideration of traditional assessment formats in academia is warranted.</div></div>","PeriodicalId":45496,"journal":{"name":"International Review of Economics Education","volume":"49 ","pages":"Article 100321"},"PeriodicalIF":1.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Review of Economics Education","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1477388025000131","RegionNum":4,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

We investigate whether human experts can identify AI-generated academic texts more accurately than current machine-based detectors. Conducted as a survey experiment at a German university of applied sciences, 63 lecturers in engineering, economics, and social sciences were asked to evaluate short excerpts (200–300 words) from both human-generated and AI-generated texts. These texts varied by discipline and writing level (student vs. professional) with the AI-generated content. The results show that both human evaluators and AI detectors correctly identified AI-generated texts only slightly better than chance, with humans achieving a recognition rate of 57 % for AI texts and 64 % for human-generated texts. There was no statistically significant difference between human and machine performance. Notably, professional-level AI texts were the most difficult to identify, with less than 20 % of respondents correctly classifying them. Regression analyses suggest that prior teaching experience slightly improves recognition accuracy, while subjective judgments of text quality were not influenced by actual or presumed authorship. These findings suggest that current written examination practices are increasingly vulnerable to undetected AI use. Both human judgment and existing AI detectors show high error rates, especially for high-quality AI-generated content. We conclude that a reconsideration of traditional assessment formats in academia is warranted.
人类比机器更能识别人工智能生成的文本吗?证据基于德国论文的摘录
我们研究人类专家是否能比目前基于机器的检测器更准确地识别人工智能生成的学术文本。德国一所应用科学大学进行了一项调查实验,要求63名工程、经济、社会科学讲师对人工生成和人工智能生成的文本中的200-300字的短文进行评价。这些文本因学科和写作水平(学生与专业)以及人工智能生成的内容而异。结果表明,人类评估者和人工智能检测器正确识别人工智能生成的文本仅略好于偶然,人类对人工智能文本的识别率为57 %,对人工生成文本的识别率为64 %。人和机器的表现没有统计学上的显著差异。值得注意的是,专业级别的人工智能文本最难识别,只有不到20% %的受访者正确地对它们进行了分类。回归分析表明,先前的教学经验略微提高了识别准确率,而对文本质量的主观判断不受实际作者或假定作者的影响。这些发现表明,目前的笔试实践越来越容易受到未被发现的人工智能使用的影响。人类的判断和现有的人工智能检测器都显示出很高的错误率,特别是对于高质量的人工智能生成的内容。我们的结论是,学术界对传统评估格式的重新考虑是必要的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.40
自引率
4.80%
发文量
26
审稿时长
28 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信