{"title":"基于单类分类器的系统混合重采样集成方法","authors":"Pranita Baro, Malaya Dutta Borah","doi":"10.1111/coin.70004","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Imbalanced classification and data incompleteness are two critical issues in machine learning that, despite significant research, are difficult to solve. This paper presents the Systematic Hybrid Resampling Ensemble Approach that deals with the class imbalance and incompleteness of data at a given dataset and improves classification performance. We use an oscillator-guided Factor Based Multiple Imputation Oversampling technique to balance out the minority and majority data samples, while substituting missing values in the dataset. The improved dataset is an oversampled dataset and it goes through random undersample to create majority and minority class subsets. These subsets are then trained with the classifiers using one of the One Class Classifier-based methods, that is, One Class Support Vector Machine or Local Outlier Factor. Lastly, bootstrap aggregation ensemble setups are done using majority and minority class classifiers and combining them to come up with a score-based prediction. To mimic real-life scenarios where data could be missing, we introduce random missing values on each of these imbalance datasets to create <span></span><math>\n <semantics>\n <mrow>\n <mn>3</mn>\n </mrow>\n <annotation>$$ 3 $$</annotation>\n </semantics></math> new sets from each dataset with different missing values, that is, (10%, 20%, and 30%). The proposed method is experimented with using datasets taken from the KEEL website, and the results are compared against RBG, SBG, SBT, DTE, and EUS. Experimental analysis shows that the proposed approach gives better results revealing the efficiency and significance compared to the existing methods. The proposed method Local Outlier Factor Systematic Hybrid Resampling Ensemble Approach improves by 3.46%, 5.30%, 10.51% and 9.26% in terms of Recall, AUC, f-measure and g-mean and One Class Support Vector Machine Systematic Hybrid Resampling Ensemble Approach by 4.82%, 5.95%, 11.03% and 8.80% respectively.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"40 6","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SHREA: A Systematic Hybrid Resampling Ensemble Approach Using One Class Classifier\",\"authors\":\"Pranita Baro, Malaya Dutta Borah\",\"doi\":\"10.1111/coin.70004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Imbalanced classification and data incompleteness are two critical issues in machine learning that, despite significant research, are difficult to solve. This paper presents the Systematic Hybrid Resampling Ensemble Approach that deals with the class imbalance and incompleteness of data at a given dataset and improves classification performance. We use an oscillator-guided Factor Based Multiple Imputation Oversampling technique to balance out the minority and majority data samples, while substituting missing values in the dataset. The improved dataset is an oversampled dataset and it goes through random undersample to create majority and minority class subsets. These subsets are then trained with the classifiers using one of the One Class Classifier-based methods, that is, One Class Support Vector Machine or Local Outlier Factor. Lastly, bootstrap aggregation ensemble setups are done using majority and minority class classifiers and combining them to come up with a score-based prediction. To mimic real-life scenarios where data could be missing, we introduce random missing values on each of these imbalance datasets to create <span></span><math>\\n <semantics>\\n <mrow>\\n <mn>3</mn>\\n </mrow>\\n <annotation>$$ 3 $$</annotation>\\n </semantics></math> new sets from each dataset with different missing values, that is, (10%, 20%, and 30%). The proposed method is experimented with using datasets taken from the KEEL website, and the results are compared against RBG, SBG, SBT, DTE, and EUS. Experimental analysis shows that the proposed approach gives better results revealing the efficiency and significance compared to the existing methods. The proposed method Local Outlier Factor Systematic Hybrid Resampling Ensemble Approach improves by 3.46%, 5.30%, 10.51% and 9.26% in terms of Recall, AUC, f-measure and g-mean and One Class Support Vector Machine Systematic Hybrid Resampling Ensemble Approach by 4.82%, 5.95%, 11.03% and 8.80% respectively.</p>\\n </div>\",\"PeriodicalId\":55228,\"journal\":{\"name\":\"Computational Intelligence\",\"volume\":\"40 6\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-12-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1111/coin.70004\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70004","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
分类不平衡和数据不完整是机器学习中的两个关键问题,尽管有大量的研究,但很难解决。提出了一种系统混合重采样集成方法,该方法处理了给定数据集中数据的类不平衡和不完备性,提高了分类性能。我们使用振荡器引导的基于因子的多重插值过采样技术来平衡少数和多数数据样本,同时替换数据集中的缺失值。改进的数据集是一个过采样数据集,它通过随机欠采样来创建多数和少数类子集。然后使用基于一类分类器的方法之一(即一类支持向量机或局部离群因子)对这些子集进行训练。最后,使用多数类和少数类分类器完成自举聚合集成设置,并将它们组合起来以提出基于分数的预测。为了模拟数据可能丢失的现实场景,我们在每个不平衡数据集上引入随机缺失值,以从每个数据集创建3个具有不同缺失值的$$ 3 $$新集,即(10)%, 20%, and 30%). The proposed method is experimented with using datasets taken from the KEEL website, and the results are compared against RBG, SBG, SBT, DTE, and EUS. Experimental analysis shows that the proposed approach gives better results revealing the efficiency and significance compared to the existing methods. The proposed method Local Outlier Factor Systematic Hybrid Resampling Ensemble Approach improves by 3.46%, 5.30%, 10.51% and 9.26% in terms of Recall, AUC, f-measure and g-mean and One Class Support Vector Machine Systematic Hybrid Resampling Ensemble Approach by 4.82%, 5.95%, 11.03% and 8.80% respectively.
SHREA: A Systematic Hybrid Resampling Ensemble Approach Using One Class Classifier
Imbalanced classification and data incompleteness are two critical issues in machine learning that, despite significant research, are difficult to solve. This paper presents the Systematic Hybrid Resampling Ensemble Approach that deals with the class imbalance and incompleteness of data at a given dataset and improves classification performance. We use an oscillator-guided Factor Based Multiple Imputation Oversampling technique to balance out the minority and majority data samples, while substituting missing values in the dataset. The improved dataset is an oversampled dataset and it goes through random undersample to create majority and minority class subsets. These subsets are then trained with the classifiers using one of the One Class Classifier-based methods, that is, One Class Support Vector Machine or Local Outlier Factor. Lastly, bootstrap aggregation ensemble setups are done using majority and minority class classifiers and combining them to come up with a score-based prediction. To mimic real-life scenarios where data could be missing, we introduce random missing values on each of these imbalance datasets to create new sets from each dataset with different missing values, that is, (10%, 20%, and 30%). The proposed method is experimented with using datasets taken from the KEEL website, and the results are compared against RBG, SBG, SBT, DTE, and EUS. Experimental analysis shows that the proposed approach gives better results revealing the efficiency and significance compared to the existing methods. The proposed method Local Outlier Factor Systematic Hybrid Resampling Ensemble Approach improves by 3.46%, 5.30%, 10.51% and 9.26% in terms of Recall, AUC, f-measure and g-mean and One Class Support Vector Machine Systematic Hybrid Resampling Ensemble Approach by 4.82%, 5.95%, 11.03% and 8.80% respectively.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.