{"title":"机器学习中不平衡数据集分类的处理","authors":"Seema Yadav, G. Bhole","doi":"10.1109/PuneCon50868.2020.9362471","DOIUrl":null,"url":null,"abstract":"Real world dataset consists of normal instances with lesser percentage of interesting or abnormal instances. The cost of misclassifying an abnormal instance as normal instance is very high. The majority class is normal class whereas minority class is the abnormal one. Researchers in data mining and machine learning are looking out numerous strategies to resolve issues associated with dataset that is unbalanced and also the challenges featured in way of life. Irregular distribution in the dataset is the motive behind declining performance of classifier. There are mainly two methods, algorithm based and data level based, the utmost widespread methodology associated to the current is hybrid method. The task of decision making and overall classification accuracy is affected due to bias for majority class. Ensemble technique is an effective technique. The objective of study is providing background related to imbalance class issues, way out to confront the disputes and challenges in studying unbalanced data. In support to experimental result accompanied on one of the dataset, ensemble technique in adjacent to different strategies of data-level offers improved outcomes. The fusion of techniques is going to be advantageous for several applications in real-life like intrusion detection, medical diagnosis, software defect prediction, etc.","PeriodicalId":368862,"journal":{"name":"2020 IEEE Pune Section International Conference (PuneCon)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Handling Imbalanced Dataset Classification in Machine Learning\",\"authors\":\"Seema Yadav, G. Bhole\",\"doi\":\"10.1109/PuneCon50868.2020.9362471\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real world dataset consists of normal instances with lesser percentage of interesting or abnormal instances. The cost of misclassifying an abnormal instance as normal instance is very high. The majority class is normal class whereas minority class is the abnormal one. Researchers in data mining and machine learning are looking out numerous strategies to resolve issues associated with dataset that is unbalanced and also the challenges featured in way of life. Irregular distribution in the dataset is the motive behind declining performance of classifier. There are mainly two methods, algorithm based and data level based, the utmost widespread methodology associated to the current is hybrid method. The task of decision making and overall classification accuracy is affected due to bias for majority class. Ensemble technique is an effective technique. The objective of study is providing background related to imbalance class issues, way out to confront the disputes and challenges in studying unbalanced data. In support to experimental result accompanied on one of the dataset, ensemble technique in adjacent to different strategies of data-level offers improved outcomes. The fusion of techniques is going to be advantageous for several applications in real-life like intrusion detection, medical diagnosis, software defect prediction, etc.\",\"PeriodicalId\":368862,\"journal\":{\"name\":\"2020 IEEE Pune Section International Conference (PuneCon)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Pune Section International Conference (PuneCon)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PuneCon50868.2020.9362471\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Pune Section International Conference (PuneCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PuneCon50868.2020.9362471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Handling Imbalanced Dataset Classification in Machine Learning
Real world dataset consists of normal instances with lesser percentage of interesting or abnormal instances. The cost of misclassifying an abnormal instance as normal instance is very high. The majority class is normal class whereas minority class is the abnormal one. Researchers in data mining and machine learning are looking out numerous strategies to resolve issues associated with dataset that is unbalanced and also the challenges featured in way of life. Irregular distribution in the dataset is the motive behind declining performance of classifier. There are mainly two methods, algorithm based and data level based, the utmost widespread methodology associated to the current is hybrid method. The task of decision making and overall classification accuracy is affected due to bias for majority class. Ensemble technique is an effective technique. The objective of study is providing background related to imbalance class issues, way out to confront the disputes and challenges in studying unbalanced data. In support to experimental result accompanied on one of the dataset, ensemble technique in adjacent to different strategies of data-level offers improved outcomes. The fusion of techniques is going to be advantageous for several applications in real-life like intrusion detection, medical diagnosis, software defect prediction, etc.