基于Bagging集成学习模型的irs - bag集成半径- smote算法用于不平衡数据集分类

Q1 Multidisciplinary
Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen
{"title":"基于Bagging集成学习模型的irs - bag集成半径- smote算法用于不平衡数据集分类","authors":"Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen","doi":"10.28991/esj-2023-07-05-04","DOIUrl":null,"url":null,"abstract":"Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF","PeriodicalId":11586,"journal":{"name":"Emerging Science Journal","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification\",\"authors\":\"Lilis Yuningsih, Gede Angga Pradipta, Dadang Hermawan, Putu Desiana Wulaning Ayu, Dandy Pramana Hostiadi, Roy Rudolf Huizen\",\"doi\":\"10.28991/esj-2023-07-05-04\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF\",\"PeriodicalId\":11586,\"journal\":{\"name\":\"Emerging Science Journal\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Emerging Science Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.28991/esj-2023-07-05-04\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Multidisciplinary\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Emerging Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.28991/esj-2023-07-05-04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0

摘要

不平衡学习问题是分类器在类间数据样本分布不平衡时所面临的挑战。合成少数派过采样技术(SMOTE)是最著名的数据预处理方法之一。使用SMOTE进行过采样时出现的问题是噪声现象,小的分离样本以及由于数据集中的高不平衡比率而导致的过拟合现象。高失衡比和低方差条件导致合成数据生成的结果收集在类之间的狭窄区域和冲突区域,在机器学习方法的学习过程中容易出现过拟合。因此,本研究提出了Radius-SMOTE与Bagging算法的结合,称为IRS-BAG模型。对于自举生成的每个子样本,使用Radius SMOTE进行过采样。子样本上的过采样可能会克服可能发生的过拟合问题。利用不平衡公共数据集,将IRS-BAG模型与以往各种过采样方法的性能进行了比较。使用三种不同分类器的实验结果证明,与之前最先进的过采样方法相比,所有分类器与所提出的IRS-BAG模型相结合都获得了显着的改进。Doi: 10.28991/ESJ-2023-07-05-04全文:PDF
本文章由计算机程序翻译,如有差异,请以英文原文为准。
IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification
Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PDF
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Emerging Science Journal
Emerging Science Journal Multidisciplinary-Multidisciplinary
CiteScore
5.40
自引率
0.00%
发文量
155
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信