Hanieh Zehtab Hashemi, Parvaneh Parvasideh, Zahra Hasan Larijani, Fatemeh Moradi
{"title":"Analyze Students Performance of a National Exam Using Feature Selection Methods","authors":"Hanieh Zehtab Hashemi, Parvaneh Parvasideh, Zahra Hasan Larijani, Fatemeh Moradi","doi":"10.1109/ICCKE.2018.8566671","DOIUrl":null,"url":null,"abstract":"Recently, educational institutions are generating the mass of data and interesting to analyze these data for their applications. This purpose is achieved by data mining methods to extract knowledge required by the systems. This kind of dataset is usually huge and include many samples and unnecessary features. The nature of dataset implies that the analysis of data leads to inaccurate results without preprocessing. In this study, we want to find and evaluate the most important features by different feature selection methods. These methods give different results based on their nature. Therefore in the following, we evaluate obtained feature subsets with applying some machine learning methods. Here we use one educational dataset of an exam and want to construct a reliable model to predict the final outcome of this exam. We survey different feature selection and machine learning algorithms and find out the Information Gain and Gain Ratio yield better performance.","PeriodicalId":283700,"journal":{"name":"2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE.2018.8566671","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Recently, educational institutions are generating the mass of data and interesting to analyze these data for their applications. This purpose is achieved by data mining methods to extract knowledge required by the systems. This kind of dataset is usually huge and include many samples and unnecessary features. The nature of dataset implies that the analysis of data leads to inaccurate results without preprocessing. In this study, we want to find and evaluate the most important features by different feature selection methods. These methods give different results based on their nature. Therefore in the following, we evaluate obtained feature subsets with applying some machine learning methods. Here we use one educational dataset of an exam and want to construct a reliable model to predict the final outcome of this exam. We survey different feature selection and machine learning algorithms and find out the Information Gain and Gain Ratio yield better performance.