{"title":"An Effective Online Stream Feature Selection Auxiliary Method for High-Dimensional Unbalanced Data","authors":"Xingtong Qian, Yinghua Zhou","doi":"10.1109/CCAI57533.2023.10201246","DOIUrl":null,"url":null,"abstract":"In the area of feature selection from highdimensional data, online streaming feature selection methods have received extensive attention in the past few decades due to their online selection abilities. Existing online stream feature selection methods perform well on many balanced datasets, But the real datasets are usually high-dimensional and unbalanced. For example, in medical examination data, the proportion of the sick people is much smaller than that of the healthy people. In the face of unbalanced data, traditional stream feature selection algorithms confront problems such as few selected features and low classification accuracy. Therefore, how to perform online stream feature selection under high-dimensional and unbalanced conditions is a challenge. In this paper, a general and easy-toimplement auxiliary algorithm is proposed, which can supplement the existing stream feature selection methods and dig out feature subsets effectively. Finally, the experiments are carried out on seven high-dimensional and unbalanced datasets and the results show that the auxiliary method can improve the traditional online stream feature selection methods and enable the classifiers to achieve better classification performance.","PeriodicalId":285760,"journal":{"name":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","volume":"14 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAI57533.2023.10201246","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the area of feature selection from highdimensional data, online streaming feature selection methods have received extensive attention in the past few decades due to their online selection abilities. Existing online stream feature selection methods perform well on many balanced datasets, But the real datasets are usually high-dimensional and unbalanced. For example, in medical examination data, the proportion of the sick people is much smaller than that of the healthy people. In the face of unbalanced data, traditional stream feature selection algorithms confront problems such as few selected features and low classification accuracy. Therefore, how to perform online stream feature selection under high-dimensional and unbalanced conditions is a challenge. In this paper, a general and easy-toimplement auxiliary algorithm is proposed, which can supplement the existing stream feature selection methods and dig out feature subsets effectively. Finally, the experiments are carried out on seven high-dimensional and unbalanced datasets and the results show that the auxiliary method can improve the traditional online stream feature selection methods and enable the classifiers to achieve better classification performance.