Slamet Sudaryanto N, M. Purnomo, D. Purwitasari, E. M. Yuniarno
{"title":"Synthesis Ensemble Oversampling and Ensemble Tree-Based Machine Learning for Class Imbalance Problem in Breast Cancer Diagnosis","authors":"Slamet Sudaryanto N, M. Purnomo, D. Purwitasari, E. M. Yuniarno","doi":"10.1109/CENIM56801.2022.10037251","DOIUrl":null,"url":null,"abstract":"The Wisconsin Breast Cancer Database dataset describes the imbalanced class. The imbalanced class will produce accuracy that only favors the majority class but not the minority class. Several ensemble oversampling methods are SMOTE and Random Over Sampling. Meanwhile, the tree-based machine learning ensemble used is Random Forest, Adaptive Boosting, and eXtreme Gradient Boosting. At the level 1 ensemble stage, one of the ensemble models with the best performance will be selected as input for the level 2 ensemble process. The level 2 ensemble is a boosting ensemble, where the results of the best ensemble model chosen at the level 1 ensemble will be used as the base model for boosting the XGBoost algorithm. The results were tested with 10 Fold Cross Validation of 0.981, Accuracy 0.987, Recall 0.980 and Precision 0.982. The performance of our proposed framework outperforms several recent classification studies in the breast cancer domain.","PeriodicalId":118934,"journal":{"name":"2022 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CENIM56801.2022.10037251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Wisconsin Breast Cancer Database dataset describes the imbalanced class. The imbalanced class will produce accuracy that only favors the majority class but not the minority class. Several ensemble oversampling methods are SMOTE and Random Over Sampling. Meanwhile, the tree-based machine learning ensemble used is Random Forest, Adaptive Boosting, and eXtreme Gradient Boosting. At the level 1 ensemble stage, one of the ensemble models with the best performance will be selected as input for the level 2 ensemble process. The level 2 ensemble is a boosting ensemble, where the results of the best ensemble model chosen at the level 1 ensemble will be used as the base model for boosting the XGBoost algorithm. The results were tested with 10 Fold Cross Validation of 0.981, Accuracy 0.987, Recall 0.980 and Precision 0.982. The performance of our proposed framework outperforms several recent classification studies in the breast cancer domain.