Beyond Black Box AI generated Plagiarism Detection: From Sentence to Document Level

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-06-13 DOI:10.48550/arXiv.2306.08122

Mujahid Ali Quidwai, Chun Xing Li, Parijat Dube

引用次数: 3

Abstract

The increasing reliance on large language models (LLMs) in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student’s response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.

查看原文本刊更多论文

超越黑盒AI生成的抄袭检测:从句子到文档级别

学术写作越来越依赖大型语言模型(llm)，这导致了剽窃现象的增加。现有的人工智能生成的文本分类器精度有限，并且经常产生误报。我们提出了一种使用自然语言处理(NLP)技术的新方法，提供句子和文档级别的量化指标，以便人类评估者更容易解释。我们的方法采用多方面的方法，生成给定问题的多个释义版本，并将其输入法学硕士以生成答案。通过使用基于余弦相似度的对比损失函数，我们将生成的句子与学生的回答相匹配。我们的方法在分类人类和人工智能文本方面达到了高达94%的准确率，为学术环境中的剽窃检测提供了一个强大且适应性强的解决方案。随着法学硕士的进步，这种方法得到了改进，减少了对新模型训练或重新配置的需求，并提供了一种更透明的方式来评估和检测人工智能生成的文本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Innovative Use of NLP for Building Educational Applications

自引率

0.00%

发文量