A New Hybrid Sampling Approach for Classification of Imbalanced Datasets

Anantaporn Hanskunatai
{"title":"A New Hybrid Sampling Approach for Classification of Imbalanced Datasets","authors":"Anantaporn Hanskunatai","doi":"10.1109/CCOMS.2018.8463228","DOIUrl":null,"url":null,"abstract":"Nowadays it is an era of data driven. Many organizations around the world including bank, industry, commercial, and medical intend to extract knowledge from a huge of data. But in the real-word datasets, most of them occur class imbalance problems. This paper presents a new algorithm to handle an imbalanced classification. The proposed technique is a hybrid sampling approach which is the combination of a well know oversampling algorithm called SMOTE and the undersampling technique by removing the ambiguous instances from the majority class instances. The experimental results show that the new hybrid sampling method yields the better predictive performance in term of F-measure when compare with other sampling techniques. In addition, it can improve f-measure up to 59.73% and 412.26% when compare with the original dataset based on decision tree learning and naïve bayes classifiers respectively.","PeriodicalId":405664,"journal":{"name":"2018 3rd International Conference on Computer and Communication Systems (ICCCS)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 3rd International Conference on Computer and Communication Systems (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCOMS.2018.8463228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Nowadays it is an era of data driven. Many organizations around the world including bank, industry, commercial, and medical intend to extract knowledge from a huge of data. But in the real-word datasets, most of them occur class imbalance problems. This paper presents a new algorithm to handle an imbalanced classification. The proposed technique is a hybrid sampling approach which is the combination of a well know oversampling algorithm called SMOTE and the undersampling technique by removing the ambiguous instances from the majority class instances. The experimental results show that the new hybrid sampling method yields the better predictive performance in term of F-measure when compare with other sampling techniques. In addition, it can improve f-measure up to 59.73% and 412.26% when compare with the original dataset based on decision tree learning and naïve bayes classifiers respectively.
一种新的混合抽样方法用于不平衡数据集分类
如今是一个数据驱动的时代。世界各地的许多组织,包括银行、工业、商业和医疗机构,都打算从海量数据中提取知识。但在现实世界的数据集中,大多数都会出现类不平衡问题。本文提出了一种处理不平衡分类的新算法。所提出的技术是一种混合采样方法,它结合了众所周知的过采样算法SMOTE和欠采样技术,通过从大多数类实例中去除模糊实例。实验结果表明,与其他采样方法相比,该混合采样方法在f测度方面具有更好的预测性能。此外,与原始数据集相比,基于决策树学习和naïve贝叶斯分类器的f-measure分别提高了59.73%和412.26%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信