Comparing hand-crafted and deep learning approaches for detecting AI-generated text: performance, generalization, and linguistic insights

AI and ethics Pub Date : 2025-04-10 DOI:10.1007/s43681-025-00699-4

Ramtin Ardeshirifar

引用次数: 0

Abstract

This study investigates techniques for detecting machine-generated text, a critical task in the era of advanced language models. We compare two approaches: a hand-crafted feature-based method and a deep learning method using RoBERTa. Experiments were conducted on diverse datasets, including the Human ChatGPT Comparison Corpus (HC3) and GPT-2 outputs. The hand-crafted approach achieved 94% F1 score on HC3 but struggled with cross-dataset generalization. In contrast, the RoBERTa-based method demonstrated superior performance and adaptability, achieving 98% F1 score on HC3 and 97.68% on GPT-2. Our findings underscore the need for adaptive detection methods as language models evolve. This research contributes to the development of robust techniques for identifying AI-generated content, addressing critical challenges in AI ethics and responsible technology use.

查看原文本刊更多论文

比较手工制作和深度学习方法来检测人工智能生成的文本：性能、泛化和语言见解

本研究探讨了检测机器生成文本的技术，这是高级语言模型时代的一项关键任务。我们比较了两种方法：手工制作的基于特征的方法和使用RoBERTa的深度学习方法。实验在不同的数据集上进行，包括Human ChatGPT Comparison Corpus （HC3）和GPT-2输出。手工制作的方法在HC3上获得了94%的F1分数，但在跨数据集泛化方面表现不佳。相比之下，基于roberta的方法表现出更好的性能和适应性，在HC3和GPT-2上的F1得分分别为98%和97.68%。我们的研究结果强调，随着语言模型的发展，需要自适应检测方法。这项研究有助于开发强大的技术来识别人工智能生成的内容，解决人工智能伦理和负责任的技术使用方面的关键挑战。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AI and ethics

自引率

0.00%

发文量