{"title":"基于深度生成模型和过采样模型混合的在线学习预警方法","authors":"Mingyan Zhang;Yiqing Wang;Jui-Long Hung;Jie Wang;Chao Duan","doi":"10.1109/ACCESS.2025.3563642","DOIUrl":null,"url":null,"abstract":"Early warning for learning performance requires to identify the maximum number of at-risk students as early as possible within a semester. However, educational data often suffer from the issue of data imbalance, making it challenging to simultaneously achieve both high precision (accurate identification) and high recall (comprehensive coverage) in at-risk student detection. Deep generative models and oversampling models are effective methods to solve data imbalance issues, which can improve classification performance. This paper proposes a method that combines the advantages of deep generative models and oversampling models to build a blending model for dealing with imbalanced educational data, which can effectively improve the precision, recall, F1-score and AUC for online learning early warning. First, we compare baseline models to select the best classifier, then choose the highest-precision deep generative model and the highest-recall oversampling model to construct blending models, which are shown to improve early warning prediction metrics. Finally, interpretable models are used to analyze differences in at-risk student prediction between the blending model, deep generative model, and oversampling model. The proposed models are validated on both extremely imbalanced datasets and new semester datasets. Results show that: (1) Compared to the baseline model, both the base learners built by the deep generative model and the oversampling model can improve the evaluation metrics of the model, the deep generative base learners achieve higher precision than the oversampling model, while the oversampling base learners achieve higher recall than the deep generative base learners. (2) The blending model composed of deep generative base learner and oversampling base learner can further improve the F1-score and AUC based on their individual strengths, the proposed blending model can also conduct effective early warning three units earlier than baseline models. (3) Compared to its base learners, blending model G-B-Blending changes the key variables for prediction, and the at-risk students identified by the blending model come from the union set of at-risk students identified by GAN+GB and B-SMOTE+GB individually. (4) The blending model proposed in this paper achieves better prediction results than the baseline on both extremely imbalanced datasets and new semesters datasets, it can identify more at-risk students more accurately at earlier units, allowing teachers to save more energy and time for teaching interventions. This research provides significant insights for dealing with imbalanced datasets by blending with deep generative model and oversampling model in education.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"72248-72268"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10974956","citationCount":"0","resultStr":"{\"title\":\"An Early Warning Method Based on Blending of Deep Generative Model and Oversampling Model for Online Learning\",\"authors\":\"Mingyan Zhang;Yiqing Wang;Jui-Long Hung;Jie Wang;Chao Duan\",\"doi\":\"10.1109/ACCESS.2025.3563642\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Early warning for learning performance requires to identify the maximum number of at-risk students as early as possible within a semester. However, educational data often suffer from the issue of data imbalance, making it challenging to simultaneously achieve both high precision (accurate identification) and high recall (comprehensive coverage) in at-risk student detection. Deep generative models and oversampling models are effective methods to solve data imbalance issues, which can improve classification performance. This paper proposes a method that combines the advantages of deep generative models and oversampling models to build a blending model for dealing with imbalanced educational data, which can effectively improve the precision, recall, F1-score and AUC for online learning early warning. First, we compare baseline models to select the best classifier, then choose the highest-precision deep generative model and the highest-recall oversampling model to construct blending models, which are shown to improve early warning prediction metrics. Finally, interpretable models are used to analyze differences in at-risk student prediction between the blending model, deep generative model, and oversampling model. The proposed models are validated on both extremely imbalanced datasets and new semester datasets. Results show that: (1) Compared to the baseline model, both the base learners built by the deep generative model and the oversampling model can improve the evaluation metrics of the model, the deep generative base learners achieve higher precision than the oversampling model, while the oversampling base learners achieve higher recall than the deep generative base learners. (2) The blending model composed of deep generative base learner and oversampling base learner can further improve the F1-score and AUC based on their individual strengths, the proposed blending model can also conduct effective early warning three units earlier than baseline models. (3) Compared to its base learners, blending model G-B-Blending changes the key variables for prediction, and the at-risk students identified by the blending model come from the union set of at-risk students identified by GAN+GB and B-SMOTE+GB individually. (4) The blending model proposed in this paper achieves better prediction results than the baseline on both extremely imbalanced datasets and new semesters datasets, it can identify more at-risk students more accurately at earlier units, allowing teachers to save more energy and time for teaching interventions. This research provides significant insights for dealing with imbalanced datasets by blending with deep generative model and oversampling model in education.\",\"PeriodicalId\":13079,\"journal\":{\"name\":\"IEEE Access\",\"volume\":\"13 \",\"pages\":\"72248-72268\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10974956\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Access\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10974956/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10974956/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
An Early Warning Method Based on Blending of Deep Generative Model and Oversampling Model for Online Learning
Early warning for learning performance requires to identify the maximum number of at-risk students as early as possible within a semester. However, educational data often suffer from the issue of data imbalance, making it challenging to simultaneously achieve both high precision (accurate identification) and high recall (comprehensive coverage) in at-risk student detection. Deep generative models and oversampling models are effective methods to solve data imbalance issues, which can improve classification performance. This paper proposes a method that combines the advantages of deep generative models and oversampling models to build a blending model for dealing with imbalanced educational data, which can effectively improve the precision, recall, F1-score and AUC for online learning early warning. First, we compare baseline models to select the best classifier, then choose the highest-precision deep generative model and the highest-recall oversampling model to construct blending models, which are shown to improve early warning prediction metrics. Finally, interpretable models are used to analyze differences in at-risk student prediction between the blending model, deep generative model, and oversampling model. The proposed models are validated on both extremely imbalanced datasets and new semester datasets. Results show that: (1) Compared to the baseline model, both the base learners built by the deep generative model and the oversampling model can improve the evaluation metrics of the model, the deep generative base learners achieve higher precision than the oversampling model, while the oversampling base learners achieve higher recall than the deep generative base learners. (2) The blending model composed of deep generative base learner and oversampling base learner can further improve the F1-score and AUC based on their individual strengths, the proposed blending model can also conduct effective early warning three units earlier than baseline models. (3) Compared to its base learners, blending model G-B-Blending changes the key variables for prediction, and the at-risk students identified by the blending model come from the union set of at-risk students identified by GAN+GB and B-SMOTE+GB individually. (4) The blending model proposed in this paper achieves better prediction results than the baseline on both extremely imbalanced datasets and new semesters datasets, it can identify more at-risk students more accurately at earlier units, allowing teachers to save more energy and time for teaching interventions. This research provides significant insights for dealing with imbalanced datasets by blending with deep generative model and oversampling model in education.
IEEE AccessCOMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍:
IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest.
IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on:
Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals.
Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering.
Development of new or improved fabrication or manufacturing techniques.
Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.