AI-generated text detection: A comprehensive review of methods, datasets, and applications

IF 12.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Computer Science Review Pub Date : 2025-08-06 DOI:10.1016/j.cosrev.2025.100793

Tanzila Kehkashan, Raja Adil Riaz, Ahmad Sami Al-Shamayleh, Adnan Akhunzada, Noman Ali, Muhammad Hamza, Faheem Akbar

{"title":"AI-generated text detection: A comprehensive review of methods, datasets, and applications","authors":"Tanzila Kehkashan, Raja Adil Riaz, Ahmad Sami Al-Shamayleh, Adnan Akhunzada, Noman Ali, Muhammad Hamza, Faheem Akbar","doi":"10.1016/j.cosrev.2025.100793","DOIUrl":null,"url":null,"abstract":"This review examines the rapidly evolving field of AI-generated text detection, which has gained critical importance following the widespread deployment of advanced large language models like ChatGPT. We analyze the technical foundations, methodological approaches, evaluation frameworks, and practical applications of detection technologies designed to distinguish between human and machine-authored content. The paper synthesizes current knowledge across key dimensions: detection techniques ranging from statistical approaches to neural architectures, datasets and their limitations, performance metrics and evaluation challenges, real-world implementations across educational, publishing, and legal domains, and emerging research directions. Our analysis reveals significant challenges, including the inherent adversarial nature of detection, cross-domain generalization difficulties, and fairness concerns regarding certain writer populations. We identify promising trends toward multi-scale analysis, human-AI collaborative frameworks, and complementary provenance-based approaches. The review concludes that effective detection remains feasible but requires combining multiple approaches, domain-specific customization, and attention to ethical implications. This comprehensive examination serves as a resource for researchers, practitioners, and policymakers navigating the complex technical and societal dimensions of AI text detection in an era of increasingly sophisticated generative AI systems.","PeriodicalId":48633,"journal":{"name":"Computer Science Review","volume":"33 1","pages":"100793"},"PeriodicalIF":12.7000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Science Review","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1016/j.cosrev.2025.100793","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This review examines the rapidly evolving field of AI-generated text detection, which has gained critical importance following the widespread deployment of advanced large language models like ChatGPT. We analyze the technical foundations, methodological approaches, evaluation frameworks, and practical applications of detection technologies designed to distinguish between human and machine-authored content. The paper synthesizes current knowledge across key dimensions: detection techniques ranging from statistical approaches to neural architectures, datasets and their limitations, performance metrics and evaluation challenges, real-world implementations across educational, publishing, and legal domains, and emerging research directions. Our analysis reveals significant challenges, including the inherent adversarial nature of detection, cross-domain generalization difficulties, and fairness concerns regarding certain writer populations. We identify promising trends toward multi-scale analysis, human-AI collaborative frameworks, and complementary provenance-based approaches. The review concludes that effective detection remains feasible but requires combining multiple approaches, domain-specific customization, and attention to ethical implications. This comprehensive examination serves as a resource for researchers, practitioners, and policymakers navigating the complex technical and societal dimensions of AI text detection in an era of increasingly sophisticated generative AI systems.

查看原文本刊更多论文

人工智能生成的文本检测：方法、数据集和应用的全面回顾

这篇综述探讨了快速发展的人工智能生成文本检测领域，随着ChatGPT等先进大型语言模型的广泛部署，这一领域变得至关重要。我们分析了旨在区分人类和机器撰写内容的检测技术的技术基础、方法方法、评估框架和实际应用。本文综合了当前关键领域的知识：从统计方法到神经架构的检测技术，数据集及其局限性，性能指标和评估挑战，跨教育，出版和法律领域的现实世界实现，以及新兴的研究方向。我们的分析揭示了重大的挑战，包括检测固有的对抗性，跨域泛化困难，以及关于某些作者群体的公平性问题。我们确定了多尺度分析、人类-人工智能协作框架和互补的基于来源的方法的有希望的趋势。这篇综述的结论是，有效的检测仍然是可行的，但需要结合多种方法、特定领域的定制以及对伦理影响的关注。在日益复杂的生成式人工智能系统时代，这项综合检查为研究人员、从业者和政策制定者导航人工智能文本检测的复杂技术和社会维度提供了资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Science Review Computer Science-General Computer Science

CiteScore

32.70

自引率

0.00%

发文量

审稿时长

51 days

期刊介绍： Computer Science Review, a publication dedicated to research surveys and expository overviews of open problems in computer science, targets a broad audience within the field seeking comprehensive insights into the latest developments. The journal welcomes articles from various fields as long as their content impacts the advancement of computer science. In particular, articles that review the application of well-known Computer Science methods to other areas are in scope only if these articles advance the fundamental understanding of those methods.