超越黑盒AI生成的抄袭检测:从句子到文档级别

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-06-13 DOI:10.48550/arXiv.2306.08122

Mujahid Ali Quidwai, Chun Xing Li, Parijat Dube

{"title":"超越黑盒AI生成的抄袭检测:从句子到文档级别","authors":"Mujahid Ali Quidwai, Chun Xing Li, Parijat Dube","doi":"10.48550/arXiv.2306.08122","DOIUrl":null,"url":null,"abstract":"The increasing reliance on large language models (LLMs) in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student’s response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.","PeriodicalId":363390,"journal":{"name":"Workshop on Innovative Use of NLP for Building Educational Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Beyond Black Box AI generated Plagiarism Detection: From Sentence to Document Level\",\"authors\":\"Mujahid Ali Quidwai, Chun Xing Li, Parijat Dube\",\"doi\":\"10.48550/arXiv.2306.08122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing reliance on large language models (LLMs) in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student’s response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.\",\"PeriodicalId\":363390,\"journal\":{\"name\":\"Workshop on Innovative Use of NLP for Building Educational Applications\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Innovative Use of NLP for Building Educational Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2306.08122\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Innovative Use of NLP for Building Educational Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2306.08122","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

学术写作越来越依赖大型语言模型(llm)，这导致了剽窃现象的增加。现有的人工智能生成的文本分类器精度有限，并且经常产生误报。我们提出了一种使用自然语言处理(NLP)技术的新方法，提供句子和文档级别的量化指标，以便人类评估者更容易解释。我们的方法采用多方面的方法，生成给定问题的多个释义版本，并将其输入法学硕士以生成答案。通过使用基于余弦相似度的对比损失函数，我们将生成的句子与学生的回答相匹配。我们的方法在分类人类和人工智能文本方面达到了高达94%的准确率，为学术环境中的剽窃检测提供了一个强大且适应性强的解决方案。随着法学硕士的进步，这种方法得到了改进，减少了对新模型训练或重新配置的需求，并提供了一种更透明的方式来评估和检测人工智能生成的文本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Beyond Black Box AI generated Plagiarism Detection: From Sentence to Document Level

The increasing reliance on large language models (LLMs) in academic writing has led to a rise in plagiarism. Existing AI-generated text classifiers have limited accuracy and often produce false positives. We propose a novel approach using natural language processing (NLP) techniques, offering quantifiable metrics at both sentence and document levels for easier interpretation by human evaluators. Our method employs a multi-faceted approach, generating multiple paraphrased versions of a given question and inputting them into the LLM to generate answers. By using a contrastive loss function based on cosine similarity, we match generated sentences with those from the student’s response. Our approach achieves up to 94% accuracy in classifying human and AI text, providing a robust and adaptable solution for plagiarism detection in academic settings. This method improves with LLM advancements, reducing the need for new model training or reconfiguration, and offers a more transparent way of evaluating and detecting AI-generated text.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Innovative Use of NLP for Building Educational Applications

自引率

0.00%

发文量