A Comprehensive Comparative Analysis of Deep Learning Models for Student Performance Prediction in Virtual Learning Environments: Leveraging the OULA Dataset and Advanced Resampling Techniques

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-04-29 DOI:10.1109/ACCESS.2025.3564719

Bayan A. Alnasyan;Mohammed Basheri;Madini O. Alassafi

{"title":"A Comprehensive Comparative Analysis of Deep Learning Models for Student Performance Prediction in Virtual Learning Environments: Leveraging the OULA Dataset and Advanced Resampling Techniques","authors":"Bayan A. Alnasyan;Mohammed Basheri;Madini O. Alassafi","doi":"10.1109/ACCESS.2025.3564719","DOIUrl":null,"url":null,"abstract":"Predicting student performance in Virtual Learning Environments (VLEs) has become increasingly important with the growth of online education. Early identification of at-risk students allows timely interventions to improve academic outcomes. This study evaluates the performance of several Deep Learning (DL) models for tabular data, including ResNet, NODE, AutoInt, TabNet, TabTransformer (TT), SAINT, and GatedTabTransformer (GTT). Moreover, it examines the role of resampling techniques, including SMOTE, ROS, ADASYN, RUS, and Tomek Links, in addressing class imbalance. Using the OULA dataset, eight experiments were conducted for binary and multi-class classification tasks, testing different feature combinations: 1) behavioral, 2) demographic and behavioral, 3) academic and behavioral, and 4) demographic, academic, and behavioral. The results indicate that incorporating a comprehensive set of characteristics can significantly enhance the model’s performance, with academic characteristics proving more predictive than demographic characteristics. The SAINT model achieved the highest performance in binary classification (94.33% accuracy), leveraging its ability to capture meaningful yet straightforward feature interactions. For multi-class classification, SAINT again outperformed other models, achieving an accuracy of 73.22% when using the Tomek Links method, excelling in managing complex feature interactions and underrepresented classes such as “Distinction.” Statistical analysis was done using the Friedman aligned ranks test and the Nemenyi post-test to compare how well the models performed based on F1-scores from several experiments. The non-parametric Friedman test revealed significant differences among the models (<inline-formula> <tex-math>$p = 0.00013$ </tex-math></inline-formula>). SAINT and AutoInt consistently outperformed the other approaches, while ResNet and TT demonstrated the weakest performance. Post-hoc analysis using the Nemenyi test did not show statistically significant differences among mid-tier models (TabNet, GTT, NODE). A critical difference (CD) further confirmed that SAINT and AutoInt are the most effective architectures for addressing complex, imbalanced educational data. These findings highlight the importance of aligning model selection and resampling techniques with the complexity of the task and the characteristics of the data.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"75953-75972"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979810","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10979810/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Predicting student performance in Virtual Learning Environments (VLEs) has become increasingly important with the growth of online education. Early identification of at-risk students allows timely interventions to improve academic outcomes. This study evaluates the performance of several Deep Learning (DL) models for tabular data, including ResNet, NODE, AutoInt, TabNet, TabTransformer (TT), SAINT, and GatedTabTransformer (GTT). Moreover, it examines the role of resampling techniques, including SMOTE, ROS, ADASYN, RUS, and Tomek Links, in addressing class imbalance. Using the OULA dataset, eight experiments were conducted for binary and multi-class classification tasks, testing different feature combinations: 1) behavioral, 2) demographic and behavioral, 3) academic and behavioral, and 4) demographic, academic, and behavioral. The results indicate that incorporating a comprehensive set of characteristics can significantly enhance the model’s performance, with academic characteristics proving more predictive than demographic characteristics. The SAINT model achieved the highest performance in binary classification (94.33% accuracy), leveraging its ability to capture meaningful yet straightforward feature interactions. For multi-class classification, SAINT again outperformed other models, achieving an accuracy of 73.22% when using the Tomek Links method, excelling in managing complex feature interactions and underrepresented classes such as “Distinction.” Statistical analysis was done using the Friedman aligned ranks test and the Nemenyi post-test to compare how well the models performed based on F1-scores from several experiments. The non-parametric Friedman test revealed significant differences among the models (

$p = 0.00013$

). SAINT and AutoInt consistently outperformed the other approaches, while ResNet and TT demonstrated the weakest performance. Post-hoc analysis using the Nemenyi test did not show statistically significant differences among mid-tier models (TabNet, GTT, NODE). A critical difference (CD) further confirmed that SAINT and AutoInt are the most effective architectures for addressing complex, imbalanced educational data. These findings highlight the importance of aligning model selection and resampling techniques with the complexity of the task and the characteristics of the data.

查看原文本刊更多论文

虚拟学习环境中学生成绩预测的深度学习模型的综合比较分析：利用OULA数据集和高级重采样技术

随着在线教育的发展，预测学生在虚拟学习环境（VLEs）中的表现变得越来越重要。早期识别有风险的学生可以及时干预以提高学业成绩。本研究评估了几种用于表格数据的深度学习（DL）模型的性能，包括ResNet、NODE、AutoInt、TabNet、TabTransformer （TT）、SAINT和GatedTabTransformer （GTT）。此外，它还研究了重采样技术（包括SMOTE、ROS、ADASYN、RUS和Tomek Links）在解决类不平衡方面的作用。利用OULA数据集，对二分类和多分类任务进行了8个实验，测试了不同的特征组合：1)行为、2)人口统计和行为、3)学术和行为、4)人口统计、学术和行为。结果表明，纳入一组综合特征可以显著提高模型的性能，学术特征比人口特征更具预测性。SAINT模型在二元分类中取得了最高的性能（准确率为94.33%），利用其捕获有意义且直接的特征交互的能力。对于多类分类，SAINT再次优于其他模型，在使用Tomek Links方法时达到73.22%的准确率，擅长于管理复杂的特征交互和代表性不足的类，如“Distinction”。统计分析使用Friedman对齐秩检验和Nemenyi后检验来比较模型基于几个实验的f1分数的表现。非参数Friedman检验显示模型之间存在显著差异（p = 0.00013）。SAINT和AutoInt始终优于其他方法，而ResNet和TT表现出最差的性能。使用Nemenyi检验的事后分析没有显示中层模型（TabNet、GTT、NODE）之间的统计学差异。一个关键差异（CD）进一步证实了SAINT和AutoInt是处理复杂、不平衡教育数据的最有效的架构。这些发现强调了将模型选择和重采样技术与任务的复杂性和数据的特征结合起来的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.