{"title":"Do humans identify AI-generated text better than machines? Evidence based on excerpts from German theses☆","authors":"Alexandra Fiedler , Jörg Döpke","doi":"10.1016/j.iree.2025.100321","DOIUrl":null,"url":null,"abstract":"<div><div>We investigate whether human experts can identify AI-generated academic texts more accurately than current machine-based detectors. Conducted as a survey experiment at a German university of applied sciences, 63 lecturers in engineering, economics, and social sciences were asked to evaluate short excerpts (200–300 words) from both human-generated and AI-generated texts. These texts varied by discipline and writing level (student vs. professional) with the AI-generated content. The results show that both human evaluators and AI detectors correctly identified AI-generated texts only slightly better than chance, with humans achieving a recognition rate of 57 % for AI texts and 64 % for human-generated texts. There was no statistically significant difference between human and machine performance. Notably, professional-level AI texts were the most difficult to identify, with less than 20 % of respondents correctly classifying them. Regression analyses suggest that prior teaching experience slightly improves recognition accuracy, while subjective judgments of text quality were not influenced by actual or presumed authorship. These findings suggest that current written examination practices are increasingly vulnerable to undetected AI use. Both human judgment and existing AI detectors show high error rates, especially for high-quality AI-generated content. We conclude that a reconsideration of traditional assessment formats in academia is warranted.</div></div>","PeriodicalId":45496,"journal":{"name":"International Review of Economics Education","volume":"49 ","pages":"Article 100321"},"PeriodicalIF":1.3000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Review of Economics Education","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1477388025000131","RegionNum":4,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
We investigate whether human experts can identify AI-generated academic texts more accurately than current machine-based detectors. Conducted as a survey experiment at a German university of applied sciences, 63 lecturers in engineering, economics, and social sciences were asked to evaluate short excerpts (200–300 words) from both human-generated and AI-generated texts. These texts varied by discipline and writing level (student vs. professional) with the AI-generated content. The results show that both human evaluators and AI detectors correctly identified AI-generated texts only slightly better than chance, with humans achieving a recognition rate of 57 % for AI texts and 64 % for human-generated texts. There was no statistically significant difference between human and machine performance. Notably, professional-level AI texts were the most difficult to identify, with less than 20 % of respondents correctly classifying them. Regression analyses suggest that prior teaching experience slightly improves recognition accuracy, while subjective judgments of text quality were not influenced by actual or presumed authorship. These findings suggest that current written examination practices are increasingly vulnerable to undetected AI use. Both human judgment and existing AI detectors show high error rates, especially for high-quality AI-generated content. We conclude that a reconsideration of traditional assessment formats in academia is warranted.