Towards Finding a Minimal Set of Features for Predicting Students' Performance Using Educational Data Mining

Q2 Social Sciences

International Journal of Modern Education and Computer Science Pub Date : 2023-06-08 DOI:10.5815/ijmecs.2023.03.04

S. Sengupta

{"title":"Towards Finding a Minimal Set of Features for Predicting Students' Performance Using Educational Data Mining","authors":"S. Sengupta","doi":"10.5815/ijmecs.2023.03.04","DOIUrl":null,"url":null,"abstract":": An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011-2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups-one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.","PeriodicalId":36486,"journal":{"name":"International Journal of Modern Education and Computer Science","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Modern Education and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijmecs.2023.03.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 1

Abstract

: An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011-2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups-one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.

查看原文本刊更多论文

利用教育数据挖掘寻找预测学生成绩的最小特征集

：对学生学习成绩的早期预测有助于识别有风险的学生，并使管理层能够采取纠正措施，防止他们误入歧途。该领域的大多数研究工作都对其精心制作的具有众多属性或特征的数据集使用了监督机器学习方法。由于这些数据集尚未公开，因此很难理解和比较所选特征的重要性以及分类任务中使用的不同机器学习模型的功效。在这项工作中，我们分析了最近十年（2011-2021年）发表的27篇研究论文，这些论文使用机器学习模型来预测学生的表现。我们确定了私有数据集中最常用的功能、它们的相互关系和抽象级别。我们还探索了三个流行的公共数据集，并对其特征进行了统计分析，如卡方检验和Person相关性。通过融合频繁特征和统计显著特征来制备基本特征的最小集合。我们提出了一种算法，用于从具有给定特征集的任何数据集中选择最小特征集。我们在两个实验设置中比较了不同机器学习模型在三个公共数据集上的性能，一个具有完整的特征集，另一个具有最小的特征集。与使用完整特征集相比，可以观察到大多数监督模型的性能几乎相同，在某些情况下，使用简化特征集甚至更好。所提出的方法能够从任何新的数据集中识别出最重要的特征集，用于预测学生的表现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Modern Education and Computer Science Social Sciences-Education

CiteScore

4.70

自引率

0.00%

发文量