Zhuo Wang, Huan Li, Bin Nie, Jianqiang Du, Yuwen Du, Yufeng Chen
{"title":"特征选择采用不同的评价策略和随机森林","authors":"Zhuo Wang, Huan Li, Bin Nie, Jianqiang Du, Yuwen Du, Yufeng Chen","doi":"10.1109/ICCEAI52939.2021.00062","DOIUrl":null,"url":null,"abstract":"Aiming at the dimensional disaster and over-fitting problems in data analysis, this paper proposes a feature selection method using hybrid integration of difference models and random forests (Integrate-RF), firstly, Integrate-RF use CART, CHAID, SVM, BN, NN, K-Means, Kohonen to evaluate the importance of features, and then, for the above seven sorts, Integrate-RF use the arithmetic average method to calculate the importance of the features; secondly, Integrate-RF select the most important features from the remaining features into features subset, and use random forest classification to get the corresponding out-of-bag(OOB) data classification error rate; finally, the optimal features subset can be selected based on the OOB data classification error rate. Experiments show that feature selection methods proposed in this paper effectively reduces the data dimension, selects features better and more adaptable.","PeriodicalId":331409,"journal":{"name":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Feature selection using different evaluate strategy and random forests\",\"authors\":\"Zhuo Wang, Huan Li, Bin Nie, Jianqiang Du, Yuwen Du, Yufeng Chen\",\"doi\":\"10.1109/ICCEAI52939.2021.00062\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Aiming at the dimensional disaster and over-fitting problems in data analysis, this paper proposes a feature selection method using hybrid integration of difference models and random forests (Integrate-RF), firstly, Integrate-RF use CART, CHAID, SVM, BN, NN, K-Means, Kohonen to evaluate the importance of features, and then, for the above seven sorts, Integrate-RF use the arithmetic average method to calculate the importance of the features; secondly, Integrate-RF select the most important features from the remaining features into features subset, and use random forest classification to get the corresponding out-of-bag(OOB) data classification error rate; finally, the optimal features subset can be selected based on the OOB data classification error rate. Experiments show that feature selection methods proposed in this paper effectively reduces the data dimension, selects features better and more adaptable.\",\"PeriodicalId\":331409,\"journal\":{\"name\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEAI52939.2021.00062\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI52939.2021.00062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature selection using different evaluate strategy and random forests
Aiming at the dimensional disaster and over-fitting problems in data analysis, this paper proposes a feature selection method using hybrid integration of difference models and random forests (Integrate-RF), firstly, Integrate-RF use CART, CHAID, SVM, BN, NN, K-Means, Kohonen to evaluate the importance of features, and then, for the above seven sorts, Integrate-RF use the arithmetic average method to calculate the importance of the features; secondly, Integrate-RF select the most important features from the remaining features into features subset, and use random forest classification to get the corresponding out-of-bag(OOB) data classification error rate; finally, the optimal features subset can be selected based on the OOB data classification error rate. Experiments show that feature selection methods proposed in this paper effectively reduces the data dimension, selects features better and more adaptable.