Handling Imbalanced Dataset Classification in Machine Learning

Seema Yadav, G. Bhole
{"title":"Handling Imbalanced Dataset Classification in Machine Learning","authors":"Seema Yadav, G. Bhole","doi":"10.1109/PuneCon50868.2020.9362471","DOIUrl":null,"url":null,"abstract":"Real world dataset consists of normal instances with lesser percentage of interesting or abnormal instances. The cost of misclassifying an abnormal instance as normal instance is very high. The majority class is normal class whereas minority class is the abnormal one. Researchers in data mining and machine learning are looking out numerous strategies to resolve issues associated with dataset that is unbalanced and also the challenges featured in way of life. Irregular distribution in the dataset is the motive behind declining performance of classifier. There are mainly two methods, algorithm based and data level based, the utmost widespread methodology associated to the current is hybrid method. The task of decision making and overall classification accuracy is affected due to bias for majority class. Ensemble technique is an effective technique. The objective of study is providing background related to imbalance class issues, way out to confront the disputes and challenges in studying unbalanced data. In support to experimental result accompanied on one of the dataset, ensemble technique in adjacent to different strategies of data-level offers improved outcomes. The fusion of techniques is going to be advantageous for several applications in real-life like intrusion detection, medical diagnosis, software defect prediction, etc.","PeriodicalId":368862,"journal":{"name":"2020 IEEE Pune Section International Conference (PuneCon)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Pune Section International Conference (PuneCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PuneCon50868.2020.9362471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Real world dataset consists of normal instances with lesser percentage of interesting or abnormal instances. The cost of misclassifying an abnormal instance as normal instance is very high. The majority class is normal class whereas minority class is the abnormal one. Researchers in data mining and machine learning are looking out numerous strategies to resolve issues associated with dataset that is unbalanced and also the challenges featured in way of life. Irregular distribution in the dataset is the motive behind declining performance of classifier. There are mainly two methods, algorithm based and data level based, the utmost widespread methodology associated to the current is hybrid method. The task of decision making and overall classification accuracy is affected due to bias for majority class. Ensemble technique is an effective technique. The objective of study is providing background related to imbalance class issues, way out to confront the disputes and challenges in studying unbalanced data. In support to experimental result accompanied on one of the dataset, ensemble technique in adjacent to different strategies of data-level offers improved outcomes. The fusion of techniques is going to be advantageous for several applications in real-life like intrusion detection, medical diagnosis, software defect prediction, etc.
机器学习中不平衡数据集分类的处理
真实世界的数据集由正常实例和较少百分比的有趣或异常实例组成。将异常实例误分类为正常实例的代价非常高。多数阶级是正常阶级,少数阶级是异常阶级。数据挖掘和机器学习的研究人员正在寻找许多策略来解决与不平衡数据集相关的问题,以及生活方式所面临的挑战。数据集中的不规则分布是分类器性能下降的原因。主要有两种方法,基于算法的方法和基于数据层次的方法,目前最广泛的方法是混合方法。多数类的偏倚会影响决策任务和整体分类精度。集成技术是一种有效的技术。研究的目的是提供与不平衡类问题相关的背景,以及在不平衡数据研究中应对争议和挑战的途径。为了支持一个数据集上的实验结果,集成技术在相邻的不同数据级策略上提供了改进的结果。这些技术的融合将有利于入侵检测、医疗诊断、软件缺陷预测等现实生活中的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信