Towards Finding a Minimal Set of Features for Predicting Students' Performance Using Educational Data Mining

Q2 Social Sciences
S. Sengupta
{"title":"Towards Finding a Minimal Set of Features for Predicting Students' Performance Using Educational Data Mining","authors":"S. Sengupta","doi":"10.5815/ijmecs.2023.03.04","DOIUrl":null,"url":null,"abstract":": An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011-2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups-one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.","PeriodicalId":36486,"journal":{"name":"International Journal of Modern Education and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Modern Education and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijmecs.2023.03.04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 1

Abstract

: An early prediction of students' academic performance helps to identify at-risk students and enables management to take corrective actions to prevent them from going astray. Most of the research works in this field have used supervised machine learning approaches to their crafted datasets having numerous attributes or features. Since these datasets are not publicly available, it is hard to understand and compare the significance of the chosen features and the efficacy of the different machine learning models employed in the classification task. In this work, we analyzed 27 research papers published in the last ten tears (2011-2021) that used machine learning models for predicting students' performance. We identify the most frequently used features in the private datasets, their interrelationships, and abstraction levels. We also explored three popular public datasets and performed statistical analysis like the Chi-square test and Person's correlation on its features. A minimal set of essential features is prepared by fusing the frequent features and the statistically significant features. We propose an algorithm for selecting a minimal set of features from any dataset with a given set of features. We compared the performance of different machine learning models on the three public datasets in two experimental setups-one with the complete feature set and the other with a minimal set of features. Compared to using the complete feature set, it is observed that most supervised models perform nearly identically and, in some cases, even better with the reduced feature set. The proposed method is capable of identifying the most essential feature set from any new dataset for predicting students' performance.
利用教育数据挖掘寻找预测学生成绩的最小特征集
:对学生学习成绩的早期预测有助于识别有风险的学生,并使管理层能够采取纠正措施,防止他们误入歧途。该领域的大多数研究工作都对其精心制作的具有众多属性或特征的数据集使用了监督机器学习方法。由于这些数据集尚未公开,因此很难理解和比较所选特征的重要性以及分类任务中使用的不同机器学习模型的功效。在这项工作中,我们分析了最近十年(2011-2021年)发表的27篇研究论文,这些论文使用机器学习模型来预测学生的表现。我们确定了私有数据集中最常用的功能、它们的相互关系和抽象级别。我们还探索了三个流行的公共数据集,并对其特征进行了统计分析,如卡方检验和Person相关性。通过融合频繁特征和统计显著特征来制备基本特征的最小集合。我们提出了一种算法,用于从具有给定特征集的任何数据集中选择最小特征集。我们在两个实验设置中比较了不同机器学习模型在三个公共数据集上的性能,一个具有完整的特征集,另一个具有最小的特征集。与使用完整特征集相比,可以观察到大多数监督模型的性能几乎相同,在某些情况下,使用简化特征集甚至更好。所提出的方法能够从任何新的数据集中识别出最重要的特征集,用于预测学生的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
4.70
自引率
0.00%
发文量
29
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信