通过生成式对抗网络推进学生成绩预测

Q1 Social Sciences

Computers and Education Artificial Intelligence Pub Date : 2024-09-17 DOI:10.1016/j.caeai.2024.100293

Helia Farhood , Ibrahim Joudah , Amin Beheshti , Samuel Muller

{"title":"通过生成式对抗网络推进学生成绩预测","authors":"Helia Farhood , Ibrahim Joudah , Amin Beheshti , Samuel Muller","doi":"10.1016/j.caeai.2024.100293","DOIUrl":null,"url":null,"abstract":"<div><p>Predicting student outcomes is essential in educational analytics for creating personalised learning experiences. The effectiveness of these predictive models relies on having access to sufficient and accurate data. However, privacy concerns and the lack of student consent often restrict data collection, limiting the applicability of predictive models. To tackle this obstacle, we employ Generative Adversarial Networks, a type of Generative AI, to generate tabular data replicating and enlarging the dimensions of two distinct publicly available student datasets. The ‘Math dataset’ has 395 observations and 33 features, whereas the ‘Exam dataset’ has 1000 observations and 8 features. Using advanced Python libraries, Conditional Tabular Generative Adversarial Networks and Copula Generative Adversarial Networks, our methodology consists of two phases. First, a mirroring approach where we produce synthetic data matching the volume of the real datasets, focusing on privacy and evaluating predictive accuracy. Second, augmenting the real datasets with newly created synthetic observations to fill gaps in datasets that lack student data. We validate the synthetic data before employing these approaches using Correlation Analysis, Density Analysis, Correlation Heatmaps, and Principal Component Analysis. We then compare the predictive accuracy of whether students will pass or fail their exams across original, synthetic, and augmented datasets. Employing Feedforward Neural Networks, Convolutional Neural Networks, and Gradient-boosted Neural Networks, and using Bayesian optimisation for hyperparameter tuning, this research methodically examines the impact of synthetic data on prediction accuracy. We implement and optimize these models using Python. Our mirroring approach aims to achieve accuracy rates that closely align with the original data. Meanwhile, our augmenting approach seeks to reach a slightly higher accuracy level than when solely learning from the original data. Our findings provide actionable insights into leveraging advanced Generative AI techniques to enhance educational outcomes and meet our objectives successfully.</p></div>","PeriodicalId":34469,"journal":{"name":"Computers and Education Artificial Intelligence","volume":"7 ","pages":"Article 100293"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666920X24000961/pdfft?md5=84ea8e2d09f812e84420042fb46f8199&pid=1-s2.0-S2666920X24000961-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Advancing student outcome predictions through generative adversarial networks\",\"authors\":\"Helia Farhood , Ibrahim Joudah , Amin Beheshti , Samuel Muller\",\"doi\":\"10.1016/j.caeai.2024.100293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Predicting student outcomes is essential in educational analytics for creating personalised learning experiences. The effectiveness of these predictive models relies on having access to sufficient and accurate data. However, privacy concerns and the lack of student consent often restrict data collection, limiting the applicability of predictive models. To tackle this obstacle, we employ Generative Adversarial Networks, a type of Generative AI, to generate tabular data replicating and enlarging the dimensions of two distinct publicly available student datasets. The ‘Math dataset’ has 395 observations and 33 features, whereas the ‘Exam dataset’ has 1000 observations and 8 features. Using advanced Python libraries, Conditional Tabular Generative Adversarial Networks and Copula Generative Adversarial Networks, our methodology consists of two phases. First, a mirroring approach where we produce synthetic data matching the volume of the real datasets, focusing on privacy and evaluating predictive accuracy. Second, augmenting the real datasets with newly created synthetic observations to fill gaps in datasets that lack student data. We validate the synthetic data before employing these approaches using Correlation Analysis, Density Analysis, Correlation Heatmaps, and Principal Component Analysis. We then compare the predictive accuracy of whether students will pass or fail their exams across original, synthetic, and augmented datasets. Employing Feedforward Neural Networks, Convolutional Neural Networks, and Gradient-boosted Neural Networks, and using Bayesian optimisation for hyperparameter tuning, this research methodically examines the impact of synthetic data on prediction accuracy. We implement and optimize these models using Python. Our mirroring approach aims to achieve accuracy rates that closely align with the original data. Meanwhile, our augmenting approach seeks to reach a slightly higher accuracy level than when solely learning from the original data. Our findings provide actionable insights into leveraging advanced Generative AI techniques to enhance educational outcomes and meet our objectives successfully.</p></div>\",\"PeriodicalId\":34469,\"journal\":{\"name\":\"Computers and Education Artificial Intelligence\",\"volume\":\"7 \",\"pages\":\"Article 100293\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666920X24000961/pdfft?md5=84ea8e2d09f812e84420042fb46f8199&pid=1-s2.0-S2666920X24000961-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Education Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666920X24000961\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Education Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666920X24000961","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 0

摘要

在教育分析中，预测学生成绩对于创造个性化学习体验至关重要。这些预测模型的有效性取决于能否获得充足而准确的数据。然而，隐私问题和未经学生同意往往限制了数据收集，从而限制了预测模型的适用性。为了解决这一障碍，我们采用了生成式人工智能的一种--生成式对抗网络（Generative Adversarial Networks）来生成表格数据，复制并扩大两个不同的公开学生数据集的维度。数学数据集 "有 395 个观测值和 33 个特征，而 "考试数据集 "有 1000 个观测值和 8 个特征。利用先进的 Python 库、条件表生成对抗网络和 Copula 生成对抗网络，我们的方法包括两个阶段。首先，我们采用镜像方法生成与真实数据集数量相匹配的合成数据，重点关注隐私问题并评估预测准确性。其次，用新创建的合成观测数据增强真实数据集，以填补缺乏学生数据的数据集的空白。在采用这些方法之前，我们先使用相关分析、密度分析、相关热图和主成分分析对合成数据进行验证。然后，我们比较了原始数据集、合成数据集和增强数据集对学生考试及格或不及格的预测准确性。本研究采用前馈神经网络、卷积神经网络和梯度提升神经网络，并使用贝叶斯优化法进行超参数调整，从而有条不紊地检验合成数据对预测准确性的影响。我们使用 Python 实现并优化这些模型。我们的镜像方法旨在实现与原始数据接近的准确率。同时，我们的增强方法旨在达到比仅从原始数据学习时略高的准确率水平。我们的研究结果为利用先进的生成式人工智能技术提高教育成果和成功实现我们的目标提供了可行的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Advancing student outcome predictions through generative adversarial networks

Predicting student outcomes is essential in educational analytics for creating personalised learning experiences. The effectiveness of these predictive models relies on having access to sufficient and accurate data. However, privacy concerns and the lack of student consent often restrict data collection, limiting the applicability of predictive models. To tackle this obstacle, we employ Generative Adversarial Networks, a type of Generative AI, to generate tabular data replicating and enlarging the dimensions of two distinct publicly available student datasets. The ‘Math dataset’ has 395 observations and 33 features, whereas the ‘Exam dataset’ has 1000 observations and 8 features. Using advanced Python libraries, Conditional Tabular Generative Adversarial Networks and Copula Generative Adversarial Networks, our methodology consists of two phases. First, a mirroring approach where we produce synthetic data matching the volume of the real datasets, focusing on privacy and evaluating predictive accuracy. Second, augmenting the real datasets with newly created synthetic observations to fill gaps in datasets that lack student data. We validate the synthetic data before employing these approaches using Correlation Analysis, Density Analysis, Correlation Heatmaps, and Principal Component Analysis. We then compare the predictive accuracy of whether students will pass or fail their exams across original, synthetic, and augmented datasets. Employing Feedforward Neural Networks, Convolutional Neural Networks, and Gradient-boosted Neural Networks, and using Bayesian optimisation for hyperparameter tuning, this research methodically examines the impact of synthetic data on prediction accuracy. We implement and optimize these models using Python. Our mirroring approach aims to achieve accuracy rates that closely align with the original data. Meanwhile, our augmenting approach seeks to reach a slightly higher accuracy level than when solely learning from the original data. Our findings provide actionable insights into leveraging advanced Generative AI techniques to enhance educational outcomes and meet our objectives successfully.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers and Education Artificial Intelligence Social Sciences-Education

CiteScore

16.80

自引率

0.00%

发文量

审稿时长

50 days