Education Data Mining Application for Predicting Students’ Achievements of Portuguese Using Ensemble Model

Shuai Zhang, Jie Chen, Wenyu Zhang, Qiwei Xu, Jiaxuan Shi
{"title":"Education Data Mining Application for Predicting Students’ Achievements of Portuguese Using Ensemble Model","authors":"Shuai Zhang, Jie Chen, Wenyu Zhang, Qiwei Xu, Jiaxuan Shi","doi":"10.11648/J.SJEDU.20210902.16","DOIUrl":null,"url":null,"abstract":"With the emergence of the massive educational data, education data mining techniques have extensively drawn considerable interest from scholars to explore the relationship between students’ achievements and other factors. In this study, the data set about the students’ achievements of Portuguese in two secondary education schools in Portugal is selected for education data mining, which involves the personal information, social and school related factors. To analyze the relationship between the students' achievements and other factors, this study proposed an ensemble model based on weighted voting for predicting the students’ achievements of Portuguese in the final period. First, the raw data is preprocessed using some basic methods, including dummy coding, correlation analysis, standardization, and normalization. Second, the isolation forest algorithm-based outlier adaption is applied to deal with the data set to enhance the robustness of the ensemble model. Finally, two base classifiers, i.e. gradient boosting decision tree and extreme gradient boosting, are integrated to form the ensemble model. The experiments are presented for verifying the superiority of the proposed model by comparing with five base classifiers, including gradient boosting decision tree, adaptive boosting, extreme gradient boosting, random forest, and decision tree. The experimental results demonstrate that the ensemble model performs better than other base classifiers in classification, and prove the validity of the outlier adaption based on isolation forest algorithm.","PeriodicalId":93370,"journal":{"name":"Science journal of education","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science journal of education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11648/J.SJEDU.20210902.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

With the emergence of the massive educational data, education data mining techniques have extensively drawn considerable interest from scholars to explore the relationship between students’ achievements and other factors. In this study, the data set about the students’ achievements of Portuguese in two secondary education schools in Portugal is selected for education data mining, which involves the personal information, social and school related factors. To analyze the relationship between the students' achievements and other factors, this study proposed an ensemble model based on weighted voting for predicting the students’ achievements of Portuguese in the final period. First, the raw data is preprocessed using some basic methods, including dummy coding, correlation analysis, standardization, and normalization. Second, the isolation forest algorithm-based outlier adaption is applied to deal with the data set to enhance the robustness of the ensemble model. Finally, two base classifiers, i.e. gradient boosting decision tree and extreme gradient boosting, are integrated to form the ensemble model. The experiments are presented for verifying the superiority of the proposed model by comparing with five base classifiers, including gradient boosting decision tree, adaptive boosting, extreme gradient boosting, random forest, and decision tree. The experimental results demonstrate that the ensemble model performs better than other base classifiers in classification, and prove the validity of the outlier adaption based on isolation forest algorithm.
基于集成模型的教育数据挖掘在预测学生葡萄牙语成绩中的应用
随着海量教育数据的出现,教育数据挖掘技术引起了学者们的广泛兴趣,研究学生成绩与其他因素之间的关系。本研究选取葡萄牙两所中等教育学校学生葡萄牙语成绩的数据集进行教育数据挖掘,涉及个人信息、社会和学校相关因素。为了分析学生成绩与其他因素的关系,本研究提出了一种基于加权投票的集成模型,用于预测学生期末葡萄牙语成绩。首先,使用一些基本方法对原始数据进行预处理,包括伪编码、相关性分析、标准化和归一化。其次,将基于孤立林算法的异常值自适应应用于数据集处理,以增强集成模型的鲁棒性。最后,将梯度提升决策树和极限梯度提升两个基本分类器进行集成,形成集成模型。通过与梯度增强决策树、自适应增强、极限梯度增强、随机森林和决策树等五种基本分类器的比较,验证了该模型的优越性。实验结果表明,该集成模型在分类方面优于其他基本分类器,并证明了基于隔离林算法的异常值自适应的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信