Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment.

IF 2.3 3区心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Educational and Psychological Measurement Pub Date : 2023-08-01 Epub Date: 2022-08-13 DOI:10.1177/00131644221117193

Todd Zhou, Hong Jiao

{"title":"Exploration of the Stacking Ensemble Machine Learning Algorithm for Cheating Detection in Large-Scale Assessment.","authors":"Todd Zhou, Hong Jiao","doi":"10.1177/00131644221117193","DOIUrl":null,"url":null,"abstract":"<p><p>Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning algorithm for cheating detection. Furthermore, no study addressed the issue of class imbalance using resampling. This study explored the application of the stacking ensemble machine learning algorithm to analyze the item response, response time, and augmented data of test-takers to detect cheating behaviors. The performance of the stacking method was compared with that of two other ensemble methods (bagging and boosting) as well as six base non-ensemble machine learning algorithms. Issues related to class imbalance and input features were addressed. The study results indicated that stacking, resampling, and feature sets including augmented summary data generally performed better than its counterparts in cheating detection. Compared with other competing machine learning algorithms investigated in this study, the meta-model from stacking using discriminant analysis based on the top two base models-Gradient Boosting and Random Forest-generally performed the best when item responses and the augmented summary statistics were used as the input features with an under-sampling ratio of 10:1 among all the study conditions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":"83 4","pages":"831-854"},"PeriodicalIF":2.3000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10311957/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Educational and Psychological Measurement","FirstCategoryId":"102","ListUrlMain":"https://doi.org/10.1177/00131644221117193","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/8/13 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Cheating detection in large-scale assessment received considerable attention in the extant literature. However, none of the previous studies in this line of research investigated the stacking ensemble machine learning algorithm for cheating detection. Furthermore, no study addressed the issue of class imbalance using resampling. This study explored the application of the stacking ensemble machine learning algorithm to analyze the item response, response time, and augmented data of test-takers to detect cheating behaviors. The performance of the stacking method was compared with that of two other ensemble methods (bagging and boosting) as well as six base non-ensemble machine learning algorithms. Issues related to class imbalance and input features were addressed. The study results indicated that stacking, resampling, and feature sets including augmented summary data generally performed better than its counterparts in cheating detection. Compared with other competing machine learning algorithms investigated in this study, the meta-model from stacking using discriminant analysis based on the top two base models-Gradient Boosting and Random Forest-generally performed the best when item responses and the augmented summary statistics were used as the input features with an under-sampling ratio of 10:1 among all the study conditions.

查看原文本刊更多论文

探索用于大规模评估作弊检测的堆叠集合机器学习算法。

大规模评估中的作弊检测在现有文献中受到了广泛关注。然而，在这一研究方向上，之前的研究都没有调查过用于作弊检测的堆叠集合机器学习算法。此外，也没有研究使用重采样来解决类不平衡的问题。本研究探索了堆叠集合机器学习算法在分析考生的项目响应、响应时间和增强数据以检测作弊行为中的应用。研究将堆叠方法的性能与其他两种集合方法（bagging 和 boosting）以及六种基本的非集合机器学习算法进行了比较。研究还探讨了与类不平衡和输入特征相关的问题。研究结果表明，在作弊检测方面，堆叠、重采样和包含增强摘要数据的特征集的性能普遍优于同类算法。与本研究中调查的其他竞争性机器学习算法相比，在所有研究条件中，当使用项目回答和增强汇总统计数据作为输入特征时，使用基于前两个基本模型--梯度提升和随机森林--的判别分析的堆叠元模型的表现一般最佳，而采样不足比率为 10:1。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Educational and Psychological Measurement 医学-数学跨学科应用

CiteScore

5.50

自引率

7.40%

发文量

审稿时长

6-12 weeks

期刊介绍： Educational and Psychological Measurement (EPM) publishes referred scholarly work from all academic disciplines interested in the study of measurement theory, problems, and issues. Theoretical articles address new developments and techniques, and applied articles deal with innovation applications.