{"title":"A feature extraction method for small sample data based on optimal ensemble random forest","authors":"Wei Zhang, H. Zhang","doi":"10.1051/jnwpu/20224061261","DOIUrl":null,"url":null,"abstract":"High dimensional small sample data is the difficulty of data mining. When using the traditional random forest algorithm for feature selection, it is to have the poor stability and low accuracy of feature importance ranking caused by over fitting of classification results. Aiming at the difficulties of random forest in the dimensionality reduction of small sample data, a feature extraction algorithm ote-gwrffs is proposed based on small sample data. Firstly, the algorithm expands the samples based on the generated countermeasure network Gan to avoid the over fitting phenomenon of traditional random forest in the small sample classification. Then, on the basis of data expansion, the optimal tree set algorithm based on weight is adopted to reduce the impact of data distribution error on feature extraction accuracy and improve the overall stability of decision tree set. Finally, the weighted average of the weight and feature importance measure of a single decision tree is used to obtain the feature importance ranking, which solves the problem of low accuracy and poor stability in the feature selection process of small sample data. Through the UCI data set, the present algorithm is compared with the traditional random forest algorithm and the weight based random forest algorithm. The ote-gwrffs algorithm has higher stability and accuracy for processing high-dimensional and small sample data.","PeriodicalId":39691,"journal":{"name":"西北工业大学学报","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"西北工业大学学报","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.1051/jnwpu/20224061261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 1
Abstract
High dimensional small sample data is the difficulty of data mining. When using the traditional random forest algorithm for feature selection, it is to have the poor stability and low accuracy of feature importance ranking caused by over fitting of classification results. Aiming at the difficulties of random forest in the dimensionality reduction of small sample data, a feature extraction algorithm ote-gwrffs is proposed based on small sample data. Firstly, the algorithm expands the samples based on the generated countermeasure network Gan to avoid the over fitting phenomenon of traditional random forest in the small sample classification. Then, on the basis of data expansion, the optimal tree set algorithm based on weight is adopted to reduce the impact of data distribution error on feature extraction accuracy and improve the overall stability of decision tree set. Finally, the weighted average of the weight and feature importance measure of a single decision tree is used to obtain the feature importance ranking, which solves the problem of low accuracy and poor stability in the feature selection process of small sample data. Through the UCI data set, the present algorithm is compared with the traditional random forest algorithm and the weight based random forest algorithm. The ote-gwrffs algorithm has higher stability and accuracy for processing high-dimensional and small sample data.