Education Data Mining Application for Predicting Students’ Achievements of Portuguese Using Ensemble Model

Science journal of education Pub Date : 2021-04-26 DOI:10.11648/J.SJEDU.20210902.16

Shuai Zhang, Jie Chen, Wenyu Zhang, Qiwei Xu, Jiaxuan Shi

{"title":"Education Data Mining Application for Predicting Students’ Achievements of Portuguese Using Ensemble Model","authors":"Shuai Zhang, Jie Chen, Wenyu Zhang, Qiwei Xu, Jiaxuan Shi","doi":"10.11648/J.SJEDU.20210902.16","DOIUrl":null,"url":null,"abstract":"With the emergence of the massive educational data, education data mining techniques have extensively drawn considerable interest from scholars to explore the relationship between students’ achievements and other factors. In this study, the data set about the students’ achievements of Portuguese in two secondary education schools in Portugal is selected for education data mining, which involves the personal information, social and school related factors. To analyze the relationship between the students' achievements and other factors, this study proposed an ensemble model based on weighted voting for predicting the students’ achievements of Portuguese in the final period. First, the raw data is preprocessed using some basic methods, including dummy coding, correlation analysis, standardization, and normalization. Second, the isolation forest algorithm-based outlier adaption is applied to deal with the data set to enhance the robustness of the ensemble model. Finally, two base classifiers, i.e. gradient boosting decision tree and extreme gradient boosting, are integrated to form the ensemble model. The experiments are presented for verifying the superiority of the proposed model by comparing with five base classifiers, including gradient boosting decision tree, adaptive boosting, extreme gradient boosting, random forest, and decision tree. The experimental results demonstrate that the ensemble model performs better than other base classifiers in classification, and prove the validity of the outlier adaption based on isolation forest algorithm.","PeriodicalId":93370,"journal":{"name":"Science journal of education","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science journal of education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11648/J.SJEDU.20210902.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

With the emergence of the massive educational data, education data mining techniques have extensively drawn considerable interest from scholars to explore the relationship between students’ achievements and other factors. In this study, the data set about the students’ achievements of Portuguese in two secondary education schools in Portugal is selected for education data mining, which involves the personal information, social and school related factors. To analyze the relationship between the students' achievements and other factors, this study proposed an ensemble model based on weighted voting for predicting the students’ achievements of Portuguese in the final period. First, the raw data is preprocessed using some basic methods, including dummy coding, correlation analysis, standardization, and normalization. Second, the isolation forest algorithm-based outlier adaption is applied to deal with the data set to enhance the robustness of the ensemble model. Finally, two base classifiers, i.e. gradient boosting decision tree and extreme gradient boosting, are integrated to form the ensemble model. The experiments are presented for verifying the superiority of the proposed model by comparing with five base classifiers, including gradient boosting decision tree, adaptive boosting, extreme gradient boosting, random forest, and decision tree. The experimental results demonstrate that the ensemble model performs better than other base classifiers in classification, and prove the validity of the outlier adaption based on isolation forest algorithm.

查看原文本刊更多论文

基于集成模型的教育数据挖掘在预测学生葡萄牙语成绩中的应用

随着海量教育数据的出现，教育数据挖掘技术引起了学者们的广泛兴趣，研究学生成绩与其他因素之间的关系。本研究选取葡萄牙两所中等教育学校学生葡萄牙语成绩的数据集进行教育数据挖掘，涉及个人信息、社会和学校相关因素。为了分析学生成绩与其他因素的关系，本研究提出了一种基于加权投票的集成模型，用于预测学生期末葡萄牙语成绩。首先，使用一些基本方法对原始数据进行预处理，包括伪编码、相关性分析、标准化和归一化。其次，将基于孤立林算法的异常值自适应应用于数据集处理，以增强集成模型的鲁棒性。最后，将梯度提升决策树和极限梯度提升两个基本分类器进行集成，形成集成模型。通过与梯度增强决策树、自适应增强、极限梯度增强、随机森林和决策树等五种基本分类器的比较，验证了该模型的优越性。实验结果表明，该集成模型在分类方面优于其他基本分类器，并证明了基于隔离林算法的异常值自适应的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Science journal of education

自引率

0.00%

发文量