Ashmita Roy Medha, Mayur Raj Bharati, P. Baro, M. Borah
{"title":"一类不平衡的综合混合方法","authors":"Ashmita Roy Medha, Mayur Raj Bharati, P. Baro, M. Borah","doi":"10.1109/SILCON55242.2022.10028811","DOIUrl":null,"url":null,"abstract":"One of the significant challenges in Data Mining and Machine Learning is class imbalance during data processing. It refers to the situation when the samples belonging to one particular class in a dataset are much more than the samples of other classes. It causes misclassification chaos and lessens the performance of the algorithms to build real-world applications. As a result, any models that are trained on an imbalanced dataset are likely to be biased. In this paper, we have reported a hybrid approach where, we have generated a synthetic dataset based on the original dataset and merged the datasets to make a master dataset. The main objective is to leverage accuracy and improve model performance. The effectiveness of our work are shown in the terms of precision, recall and accuracy. Better results have been achieved in contrast to using the original dataset.","PeriodicalId":183947,"journal":{"name":"2022 IEEE Silchar Subsection Conference (SILCON)","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Synthetic Hybrid Approach for Class Imbalance\",\"authors\":\"Ashmita Roy Medha, Mayur Raj Bharati, P. Baro, M. Borah\",\"doi\":\"10.1109/SILCON55242.2022.10028811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the significant challenges in Data Mining and Machine Learning is class imbalance during data processing. It refers to the situation when the samples belonging to one particular class in a dataset are much more than the samples of other classes. It causes misclassification chaos and lessens the performance of the algorithms to build real-world applications. As a result, any models that are trained on an imbalanced dataset are likely to be biased. In this paper, we have reported a hybrid approach where, we have generated a synthetic dataset based on the original dataset and merged the datasets to make a master dataset. The main objective is to leverage accuracy and improve model performance. The effectiveness of our work are shown in the terms of precision, recall and accuracy. Better results have been achieved in contrast to using the original dataset.\",\"PeriodicalId\":183947,\"journal\":{\"name\":\"2022 IEEE Silchar Subsection Conference (SILCON)\",\"volume\":\"140 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Silchar Subsection Conference (SILCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SILCON55242.2022.10028811\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Silchar Subsection Conference (SILCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SILCON55242.2022.10028811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
One of the significant challenges in Data Mining and Machine Learning is class imbalance during data processing. It refers to the situation when the samples belonging to one particular class in a dataset are much more than the samples of other classes. It causes misclassification chaos and lessens the performance of the algorithms to build real-world applications. As a result, any models that are trained on an imbalanced dataset are likely to be biased. In this paper, we have reported a hybrid approach where, we have generated a synthetic dataset based on the original dataset and merged the datasets to make a master dataset. The main objective is to leverage accuracy and improve model performance. The effectiveness of our work are shown in the terms of precision, recall and accuracy. Better results have been achieved in contrast to using the original dataset.