Evaluation of AI content generation tools for verification of academic integrity in higher education

IF 1.6 Q2 EDUCATION & EDUCATIONAL RESEARCH

Journal of Applied Research in Higher Education Pub Date : 2024-07-12 DOI:10.1108/jarhe-10-2023-0470

Muhammad Bilal Saqib, Saba Zia

{"title":"Evaluation of AI content generation tools for verification of academic integrity in higher education","authors":"Muhammad Bilal Saqib, Saba Zia","doi":"10.1108/jarhe-10-2023-0470","DOIUrl":null,"url":null,"abstract":"<h3>Purpose</h3>\n<p>The notion of using a generative artificial intelligence (AI) engine for text composition has gained excessive popularity among students, educators and researchers, following the introduction of ChatGPT. However, this has added another dimension to the daunting task of verifying originality in academic writing. Consequently, the market for detecting artificially generated content has seen a mushroom growth of tools that claim to be more than 90% accurate in sensing artificially written content.</p>\n<h3>Design/methodology/approach</h3>\n<p>This research evaluates the capabilities of some highly mentioned AI detection tools to separate reality from their hyperbolic claims. For this purpose, eight AI engines have been tested on four different types of data, which cover the different ways of using ChatGPT. These types are Original, Paraphrased by AI, 100% AI generated and 100% AI generated with Contextual Information. The AI index recorded by these tools against the datasets was evaluated as an indicator of their performance.</p>\n<h3>Findings</h3>\n<p>The resulting figures of cumulative mean validate that these tools excel at identifying human generated content (1.71% AI content) and perform reasonably well in labelling AI generated content (76.85% AI content). However, they are perplexed by the scenarios where the content is either paraphrased by the AI (39.42% AI content) or generated by giving a precise context for the output (60.1% AI content).</p>\n<h3>Originality/value</h3>\n<p>This paper evaluates different services for the detection of AI-generated content to verify academic integrity in research work and higher education and provides new insights into their performance.</p>","PeriodicalId":45508,"journal":{"name":"Journal of Applied Research in Higher Education","volume":"78 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Applied Research in Higher Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/jarhe-10-2023-0470","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose

The notion of using a generative artificial intelligence (AI) engine for text composition has gained excessive popularity among students, educators and researchers, following the introduction of ChatGPT. However, this has added another dimension to the daunting task of verifying originality in academic writing. Consequently, the market for detecting artificially generated content has seen a mushroom growth of tools that claim to be more than 90% accurate in sensing artificially written content.

Design/methodology/approach

This research evaluates the capabilities of some highly mentioned AI detection tools to separate reality from their hyperbolic claims. For this purpose, eight AI engines have been tested on four different types of data, which cover the different ways of using ChatGPT. These types are Original, Paraphrased by AI, 100% AI generated and 100% AI generated with Contextual Information. The AI index recorded by these tools against the datasets was evaluated as an indicator of their performance.

Findings

The resulting figures of cumulative mean validate that these tools excel at identifying human generated content (1.71% AI content) and perform reasonably well in labelling AI generated content (76.85% AI content). However, they are perplexed by the scenarios where the content is either paraphrased by the AI (39.42% AI content) or generated by giving a precise context for the output (60.1% AI content).

Originality/value

This paper evaluates different services for the detection of AI-generated content to verify academic integrity in research work and higher education and provides new insights into their performance.

查看原文本刊更多论文

评估用于验证高等教育学术诚信的人工智能内容生成工具

目的在引入 ChatGPT 之后，使用生成式人工智能（AI）引擎进行文本写作的概念在学生、教育工作者和研究人员中受到了广泛欢迎。然而，这也给验证学术写作原创性这一艰巨任务增添了新的难度。因此，市场上检测人工生成内容的工具如雨后春笋般涌现，这些工具声称其检测人工写作内容的准确率超过 90%。为此，我们在四种不同类型的数据上测试了八个人工智能引擎，这些数据涵盖了使用 ChatGPT 的不同方式。这四种类型分别是原始数据、人工智能转述数据、100% 人工智能生成数据和 100%人工智能生成并包含上下文信息的数据。结果累计平均值验证了这些工具在识别人工生成的内容（1.71% 的人工智能内容）和标记人工智能生成的内容（76.85% 的人工智能内容）方面表现出色。然而，它们对人工智能转述内容（39.42% 的人工智能内容）或通过为输出提供精确上下文而生成内容（60.1% 的人工智能内容）的情况感到困惑。原创性/价值本文评估了用于检测人工智能生成内容的不同服务，以验证研究工作和高等教育中的学术诚信，并对其性能提供了新的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Applied Research in Higher Education EDUCATION & EDUCATIONAL RESEARCH-

CiteScore

4.50

自引率

11.80%

发文量

期刊介绍： Higher education around the world has become a major topic of discussion, debate, and controversy, as a range of political, economic, social, and technological pressures result in a myriad of changes at all levels. But the quality and quantity of critical dialogue and research and their relationship with practice remains limited. This internationally peer-reviewed journal addresses this shortfall by focusing on the scholarship and practice of teaching and learning and higher education and covers: - Higher education teaching, learning, curriculum, assessment, policy, management, leadership, and related areas - Digitization, internationalization, and democratization of higher education, and related areas such as lifelong and lifewide learning - Innovation, change, and reflections on current practices