基于深度生成模型和过采样模型混合的在线学习预警方法

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-04-23 DOI:10.1109/ACCESS.2025.3563642

Mingyan Zhang;Yiqing Wang;Jui-Long Hung;Jie Wang;Chao Duan

{"title":"基于深度生成模型和过采样模型混合的在线学习预警方法","authors":"Mingyan Zhang;Yiqing Wang;Jui-Long Hung;Jie Wang;Chao Duan","doi":"10.1109/ACCESS.2025.3563642","DOIUrl":null,"url":null,"abstract":"Early warning for learning performance requires to identify the maximum number of at-risk students as early as possible within a semester. However, educational data often suffer from the issue of data imbalance, making it challenging to simultaneously achieve both high precision (accurate identification) and high recall (comprehensive coverage) in at-risk student detection. Deep generative models and oversampling models are effective methods to solve data imbalance issues, which can improve classification performance. This paper proposes a method that combines the advantages of deep generative models and oversampling models to build a blending model for dealing with imbalanced educational data, which can effectively improve the precision, recall, F1-score and AUC for online learning early warning. First, we compare baseline models to select the best classifier, then choose the highest-precision deep generative model and the highest-recall oversampling model to construct blending models, which are shown to improve early warning prediction metrics. Finally, interpretable models are used to analyze differences in at-risk student prediction between the blending model, deep generative model, and oversampling model. The proposed models are validated on both extremely imbalanced datasets and new semester datasets. Results show that: (1) Compared to the baseline model, both the base learners built by the deep generative model and the oversampling model can improve the evaluation metrics of the model, the deep generative base learners achieve higher precision than the oversampling model, while the oversampling base learners achieve higher recall than the deep generative base learners. (2) The blending model composed of deep generative base learner and oversampling base learner can further improve the F1-score and AUC based on their individual strengths, the proposed blending model can also conduct effective early warning three units earlier than baseline models. (3) Compared to its base learners, blending model G-B-Blending changes the key variables for prediction, and the at-risk students identified by the blending model come from the union set of at-risk students identified by GAN+GB and B-SMOTE+GB individually. (4) The blending model proposed in this paper achieves better prediction results than the baseline on both extremely imbalanced datasets and new semesters datasets, it can identify more at-risk students more accurately at earlier units, allowing teachers to save more energy and time for teaching interventions. This research provides significant insights for dealing with imbalanced datasets by blending with deep generative model and oversampling model in education.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"72248-72268"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10974956","citationCount":"0","resultStr":"{\"title\":\"An Early Warning Method Based on Blending of Deep Generative Model and Oversampling Model for Online Learning\",\"authors\":\"Mingyan Zhang;Yiqing Wang;Jui-Long Hung;Jie Wang;Chao Duan\",\"doi\":\"10.1109/ACCESS.2025.3563642\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Early warning for learning performance requires to identify the maximum number of at-risk students as early as possible within a semester. However, educational data often suffer from the issue of data imbalance, making it challenging to simultaneously achieve both high precision (accurate identification) and high recall (comprehensive coverage) in at-risk student detection. Deep generative models and oversampling models are effective methods to solve data imbalance issues, which can improve classification performance. This paper proposes a method that combines the advantages of deep generative models and oversampling models to build a blending model for dealing with imbalanced educational data, which can effectively improve the precision, recall, F1-score and AUC for online learning early warning. First, we compare baseline models to select the best classifier, then choose the highest-precision deep generative model and the highest-recall oversampling model to construct blending models, which are shown to improve early warning prediction metrics. Finally, interpretable models are used to analyze differences in at-risk student prediction between the blending model, deep generative model, and oversampling model. The proposed models are validated on both extremely imbalanced datasets and new semester datasets. Results show that: (1) Compared to the baseline model, both the base learners built by the deep generative model and the oversampling model can improve the evaluation metrics of the model, the deep generative base learners achieve higher precision than the oversampling model, while the oversampling base learners achieve higher recall than the deep generative base learners. (2) The blending model composed of deep generative base learner and oversampling base learner can further improve the F1-score and AUC based on their individual strengths, the proposed blending model can also conduct effective early warning three units earlier than baseline models. (3) Compared to its base learners, blending model G-B-Blending changes the key variables for prediction, and the at-risk students identified by the blending model come from the union set of at-risk students identified by GAN+GB and B-SMOTE+GB individually. (4) The blending model proposed in this paper achieves better prediction results than the baseline on both extremely imbalanced datasets and new semesters datasets, it can identify more at-risk students more accurately at earlier units, allowing teachers to save more energy and time for teaching interventions. This research provides significant insights for dealing with imbalanced datasets by blending with deep generative model and oversampling model in education.\",\"PeriodicalId\":13079,\"journal\":{\"name\":\"IEEE Access\",\"volume\":\"13 \",\"pages\":\"72248-72268\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10974956\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Access\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10974956/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10974956/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

学习表现的早期预警要求在一个学期内尽早确定有风险的学生的最大数量。然而，教育数据经常存在数据不平衡的问题，这使得在高风险学生检测中同时实现高精度（准确识别）和高召回率（全面覆盖）是一项挑战。深度生成模型和过采样模型是解决数据不平衡问题的有效方法，可以提高分类性能。本文提出了一种方法，结合深度生成模型和过采样模型的优点，建立一个混合模型来处理不平衡教育数据，可以有效地提高在线学习预警的准确率、召回率、f1分数和AUC。首先，我们比较基线模型来选择最佳分类器，然后选择最高精度的深度生成模型和最高召回率的过采样模型来构建混合模型，结果表明混合模型可以提高预警预测指标。最后，利用可解释模型分析混合模型、深度生成模型和过采样模型在风险学生预测方面的差异。在极不平衡数据集和新学期数据集上验证了所提出的模型。结果表明：(1)与基线模型相比，深度生成模型和过采样模型构建的基础学习器都能提高模型的评价指标，深度生成基础学习器比过采样模型获得更高的精度，而过采样基础学习器比深度生成基础学习器获得更高的召回率。(2)由深度生成基学习器和过采样基学习器组成的混合模型可以根据各自的优势进一步提高f1分数和AUC，所提出的混合模型还可以比基线模型提前3个单元进行有效预警。(3)与基本学习器相比，混合模型G-B-Blending改变了预测的关键变量，混合模型识别的风险学生来自GAN+GB和B-SMOTE+GB分别识别的风险学生的联合集。(4)本文提出的混合模型在极度不平衡数据集和新学期数据集上的预测效果都优于基线，可以更准确地识别出更早单元的高危学生，从而为教师节省更多的精力和时间进行教学干预。该研究将深度生成模型和过采样模型相结合，为教育领域的非平衡数据集处理提供了重要的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Early Warning Method Based on Blending of Deep Generative Model and Oversampling Model for Online Learning

Early warning for learning performance requires to identify the maximum number of at-risk students as early as possible within a semester. However, educational data often suffer from the issue of data imbalance, making it challenging to simultaneously achieve both high precision (accurate identification) and high recall (comprehensive coverage) in at-risk student detection. Deep generative models and oversampling models are effective methods to solve data imbalance issues, which can improve classification performance. This paper proposes a method that combines the advantages of deep generative models and oversampling models to build a blending model for dealing with imbalanced educational data, which can effectively improve the precision, recall, F1-score and AUC for online learning early warning. First, we compare baseline models to select the best classifier, then choose the highest-precision deep generative model and the highest-recall oversampling model to construct blending models, which are shown to improve early warning prediction metrics. Finally, interpretable models are used to analyze differences in at-risk student prediction between the blending model, deep generative model, and oversampling model. The proposed models are validated on both extremely imbalanced datasets and new semester datasets. Results show that: (1) Compared to the baseline model, both the base learners built by the deep generative model and the oversampling model can improve the evaluation metrics of the model, the deep generative base learners achieve higher precision than the oversampling model, while the oversampling base learners achieve higher recall than the deep generative base learners. (2) The blending model composed of deep generative base learner and oversampling base learner can further improve the F1-score and AUC based on their individual strengths, the proposed blending model can also conduct effective early warning three units earlier than baseline models. (3) Compared to its base learners, blending model G-B-Blending changes the key variables for prediction, and the at-risk students identified by the blending model come from the union set of at-risk students identified by GAN+GB and B-SMOTE+GB individually. (4) The blending model proposed in this paper achieves better prediction results than the baseline on both extremely imbalanced datasets and new semesters datasets, it can identify more at-risk students more accurately at earlier units, allowing teachers to save more energy and time for teaching interventions. This research provides significant insights for dealing with imbalanced datasets by blending with deep generative model and oversampling model in education.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.