{"title":"Educational data mining: a case study for predicting dropout-prone students","authors":"S. Kotsiantis","doi":"10.1504/IJKESDP.2009.022718","DOIUrl":null,"url":null,"abstract":"Student dropout occurs quite often in universities providing distance education and the dropout rates are definitely higher than those in conventional universities. Limiting dropout is essential in university-level distance learning and therefore the ability to predict students' dropout could be useful in a great number of different ways. Generally, data sets from this domain exhibit skewed class distributions in which most cases are allotted to the normal class (students that continue their studies) and fewer cases to the dropout class, the most interesting class. A classifier induced from an imbalanced data set has, typically, a low error rate for the majority class and an unacceptable error rate for the minority class. This paper firstly provides a systematic study on the various methodologies that have tried to handle this problem. Finally, it presents an experimental study of these methodologies with a proposed local cost sensitive technique and it concludes that such a framework can be a more effective solution to the problem.","PeriodicalId":347123,"journal":{"name":"Int. J. Knowl. Eng. Soft Data Paradigms","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Eng. Soft Data Paradigms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJKESDP.2009.022718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 61
Abstract
Student dropout occurs quite often in universities providing distance education and the dropout rates are definitely higher than those in conventional universities. Limiting dropout is essential in university-level distance learning and therefore the ability to predict students' dropout could be useful in a great number of different ways. Generally, data sets from this domain exhibit skewed class distributions in which most cases are allotted to the normal class (students that continue their studies) and fewer cases to the dropout class, the most interesting class. A classifier induced from an imbalanced data set has, typically, a low error rate for the majority class and an unacceptable error rate for the minority class. This paper firstly provides a systematic study on the various methodologies that have tried to handle this problem. Finally, it presents an experimental study of these methodologies with a proposed local cost sensitive technique and it concludes that such a framework can be a more effective solution to the problem.