ADASYN过采样和堆叠算法的多类不平衡处理

Yoga Pristyanto, A. F. Nugraha, Akhmad Dahlan, Lucky Adhikrisna Wirasakti, Aditya Ahmad Zein, Irfan Pratama
{"title":"ADASYN过采样和堆叠算法的多类不平衡处理","authors":"Yoga Pristyanto, A. F. Nugraha, Akhmad Dahlan, Lucky Adhikrisna Wirasakti, Aditya Ahmad Zein, Irfan Pratama","doi":"10.1109/IMCOM53663.2022.9721632","DOIUrl":null,"url":null,"abstract":"Class imbalance conditions in datasets are common in real-world problems. Class imbalance is a condition where the number of classes in the dataset used in the classification process has a significant difference in number. In theory, most single classifiers have a weakness against class imbalance conditions in datasets, especially those with multiclass types, so their performance cannot be maximized. This study proposes two approaches to overcome the problem of multiclass imbalanced, namely the use of ADASYN (Adaptive Synthetic) Sampling and the Stacking Algorithm. As confirmed by testing on five multiclass datasets, the proposed method outperforms other methods in terms of accuracy values, sensitivity, specificity, and geometric mean values. As a result, the method proposed in this study can solve class imbalance problems in multiclass-type datasets. However, this study has limitations. Namely, the dataset used is a multiclass category with a maximum number of six classes. For this reason, further research will suggest testing using imbalanced class datasets in the category of multiclass datasets with more than six classes.","PeriodicalId":367038,"journal":{"name":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multiclass Imbalanced Handling using ADASYN Oversampling and Stacking Algorithm\",\"authors\":\"Yoga Pristyanto, A. F. Nugraha, Akhmad Dahlan, Lucky Adhikrisna Wirasakti, Aditya Ahmad Zein, Irfan Pratama\",\"doi\":\"10.1109/IMCOM53663.2022.9721632\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Class imbalance conditions in datasets are common in real-world problems. Class imbalance is a condition where the number of classes in the dataset used in the classification process has a significant difference in number. In theory, most single classifiers have a weakness against class imbalance conditions in datasets, especially those with multiclass types, so their performance cannot be maximized. This study proposes two approaches to overcome the problem of multiclass imbalanced, namely the use of ADASYN (Adaptive Synthetic) Sampling and the Stacking Algorithm. As confirmed by testing on five multiclass datasets, the proposed method outperforms other methods in terms of accuracy values, sensitivity, specificity, and geometric mean values. As a result, the method proposed in this study can solve class imbalance problems in multiclass-type datasets. However, this study has limitations. Namely, the dataset used is a multiclass category with a maximum number of six classes. For this reason, further research will suggest testing using imbalanced class datasets in the category of multiclass datasets with more than six classes.\",\"PeriodicalId\":367038,\"journal\":{\"name\":\"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCOM53663.2022.9721632\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM53663.2022.9721632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

数据集中的类不平衡情况在现实世界的问题中很常见。类不平衡是指分类过程中使用的数据集中的类数量在数量上存在显著差异。理论上,大多数单一分类器在处理数据集中的类不平衡条件时都有弱点,尤其是那些具有多类类型的分类器,因此它们的性能无法最大化。本研究提出了克服多类不平衡问题的两种方法,即使用ADASYN (Adaptive Synthetic)采样和堆叠算法。在5个多类数据集上的测试表明,该方法在准确率、灵敏度、特异度和几何平均值方面均优于其他方法。因此,本文提出的方法可以解决多类数据集的类不平衡问题。然而,本研究也有局限性。也就是说,使用的数据集是一个多类类别,最多有六个类。因此,进一步的研究将建议在超过六个类的多类数据集类别中使用不平衡类数据集进行测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multiclass Imbalanced Handling using ADASYN Oversampling and Stacking Algorithm
Class imbalance conditions in datasets are common in real-world problems. Class imbalance is a condition where the number of classes in the dataset used in the classification process has a significant difference in number. In theory, most single classifiers have a weakness against class imbalance conditions in datasets, especially those with multiclass types, so their performance cannot be maximized. This study proposes two approaches to overcome the problem of multiclass imbalanced, namely the use of ADASYN (Adaptive Synthetic) Sampling and the Stacking Algorithm. As confirmed by testing on five multiclass datasets, the proposed method outperforms other methods in terms of accuracy values, sensitivity, specificity, and geometric mean values. As a result, the method proposed in this study can solve class imbalance problems in multiclass-type datasets. However, this study has limitations. Namely, the dataset used is a multiclass category with a maximum number of six classes. For this reason, further research will suggest testing using imbalanced class datasets in the category of multiclass datasets with more than six classes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信