{"title":"Synthetic Minority Oversampling Technique (SMOTE) for Boosting the Accuracy of C4.5 Algorithm Model","authors":"Wiwi Rahayu, Deny Jollyta, Alyauma Hajjah, Johan, Gusrianty, Gustientiedina, Yulvia Nora Marlim, Y. Desnelita","doi":"10.59934/jaiea.v3i3.469","DOIUrl":null,"url":null,"abstract":"The low accuracy of the classification model may be caused by dataset imbalance. In reality, low-accuracy models are unacceptable. The purpose of this research is to address data imbalances in an employee performance dataset identified using the C4.5 method. SMOTE is the approach for addressing data imbalance. SMOTE is utilized to generate a large amount of data in the majority or minority class, which has an initial classification accuracy of just 17%. The C4.5 algorithm classifies the new dataset created by SMOTE, which consists of 11 attributes divided three times between training and testing data. The research found that with a 60:40 data split, the classification model had a 69% accuracy. Model accuracy climbed to 76% at 70:30 data splitting, and 86% at the final splitting, which was 80:20. The model's output matches the evaluation findings obtained using the confusion matrix. The research findings indicate that SMOTE may improve classification model accuracy by boosting data in imbalanced classes.","PeriodicalId":320979,"journal":{"name":"Journal of Artificial Intelligence and Engineering Applications (JAIEA)","volume":"15 10","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Engineering Applications (JAIEA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59934/jaiea.v3i3.469","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The low accuracy of the classification model may be caused by dataset imbalance. In reality, low-accuracy models are unacceptable. The purpose of this research is to address data imbalances in an employee performance dataset identified using the C4.5 method. SMOTE is the approach for addressing data imbalance. SMOTE is utilized to generate a large amount of data in the majority or minority class, which has an initial classification accuracy of just 17%. The C4.5 algorithm classifies the new dataset created by SMOTE, which consists of 11 attributes divided three times between training and testing data. The research found that with a 60:40 data split, the classification model had a 69% accuracy. Model accuracy climbed to 76% at 70:30 data splitting, and 86% at the final splitting, which was 80:20. The model's output matches the evaluation findings obtained using the confusion matrix. The research findings indicate that SMOTE may improve classification model accuracy by boosting data in imbalanced classes.