A Framework for Detecting AI-Generated Text in Research Publications

PROCEEDINGS OF THE III INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES IN MATERIALS SCIENCE, MECHANICAL AND AUTOMATION ENGINEERING: MIP: Engineering-III – 2021 Pub Date : 2023-08-19 DOI:10.58190/icat.2023.28

Paria Sarzaeim, Aarya Mayurpalsingh Doshi, Qusay H. Mahmoud

{"title":"A Framework for Detecting AI-Generated Text in Research Publications","authors":"Paria Sarzaeim, Aarya Mayurpalsingh Doshi, Qusay H. Mahmoud","doi":"10.58190/icat.2023.28","DOIUrl":null,"url":null,"abstract":"The use of generative artificial intelligence is becoming increasingly prevalent in creating content in various formats such as text, video, and image. However, there is a need to distinguish between content that has been generated by humans and content that has been generated by AI as misuse of these technologies can raise scientific and social challenges. Moreover, there are concerns about the reliability and comprehensiveness of the content generated by AI without human validation. This paper presents a framework for AI-generated text. The prototype implementation of the proposed approach is to train a model using predefined datasets and deploy this model on a cloud-based service to predict whether a text was created by a human or AI. This approach is specifically focused on assessing the accuracy of scientific writings and research papers rather than general text. The proposed framework is compared with recently developed tools such as OpenAI Text Classifier, ZeroGPT, and Turnitin. The results show that training a text classifier can be highly useful in detecting whether a text is written by a human or AI. The source code and dataset are made open source so others can experiment with the prototype implementation and use it for future research.","PeriodicalId":20592,"journal":{"name":"PROCEEDINGS OF THE III INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES IN MATERIALS SCIENCE, MECHANICAL AND AUTOMATION ENGINEERING: MIP: Engineering-III – 2021","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PROCEEDINGS OF THE III INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES IN MATERIALS SCIENCE, MECHANICAL AND AUTOMATION ENGINEERING: MIP: Engineering-III – 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.58190/icat.2023.28","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The use of generative artificial intelligence is becoming increasingly prevalent in creating content in various formats such as text, video, and image. However, there is a need to distinguish between content that has been generated by humans and content that has been generated by AI as misuse of these technologies can raise scientific and social challenges. Moreover, there are concerns about the reliability and comprehensiveness of the content generated by AI without human validation. This paper presents a framework for AI-generated text. The prototype implementation of the proposed approach is to train a model using predefined datasets and deploy this model on a cloud-based service to predict whether a text was created by a human or AI. This approach is specifically focused on assessing the accuracy of scientific writings and research papers rather than general text. The proposed framework is compared with recently developed tools such as OpenAI Text Classifier, ZeroGPT, and Turnitin. The results show that training a text classifier can be highly useful in detecting whether a text is written by a human or AI. The source code and dataset are made open source so others can experiment with the prototype implementation and use it for future research.

查看原文本刊更多论文

研究出版物中人工智能生成文本的检测框架

生成式人工智能的使用在创建各种格式的内容(如文本、视频和图像)方面变得越来越普遍。然而，有必要区分人类生成的内容和人工智能生成的内容，因为滥用这些技术可能会带来科学和社会挑战。此外，在没有人工验证的情况下，人工智能生成的内容的可靠性和全面性也令人担忧。本文提出了一个人工智能生成文本的框架。提出的方法的原型实现是使用预定义的数据集训练模型，并将该模型部署在基于云的服务上，以预测文本是由人类还是人工智能创建的。这种方法特别侧重于评估科学著作和研究论文的准确性，而不是一般文本。该框架与最近开发的工具(如OpenAI文本分类器、ZeroGPT和Turnitin)进行了比较。结果表明，训练文本分类器对于检测文本是由人类还是人工智能编写的非常有用。源代码和数据集都是开源的，因此其他人可以尝试原型实现并将其用于未来的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PROCEEDINGS OF THE III INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES IN MATERIALS SCIENCE, MECHANICAL AND AUTOMATION ENGINEERING: MIP: Engineering-III – 2021

自引率

0.00%

发文量