{"title":"Application of High Dimensional Feature Grouping Method in Near-Infrared Spectra of Identification of Tobacco Growing Areas","authors":"Cheng Zhu, Huili Gong, Zhongren Li, Chunxia Yu","doi":"10.1109/ICISCE.2016.58","DOIUrl":null,"url":null,"abstract":"In order to increase the classification accuracy, the paper presents a novel feature grouping method, which is based on random forest variable importance measures. We applied the method to the classification of growing areas of tobacco and also compared it with other methods. The results showed that our proposed method efficiently got the optimal feature subset and can be used to identify the growing areas of tobacco. The feature grouping divided all features into different groups according to feature importance scores measured by random forest variable importance measures. The optimal feature subset was generated by continuous groups with important features, while the groups with irrelevant features were eliminated, which degraded the difficulty of feature selection. The experimental results demonstrated that our proposed method successfully eliminated the irrelevant features and got the optimal feature subset, leading to a significant improvement on the classification accuracy.","PeriodicalId":6882,"journal":{"name":"2016 3rd International Conference on Information Science and Control Engineering (ICISCE)","volume":"4 1","pages":"230-234"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 3rd International Conference on Information Science and Control Engineering (ICISCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISCE.2016.58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In order to increase the classification accuracy, the paper presents a novel feature grouping method, which is based on random forest variable importance measures. We applied the method to the classification of growing areas of tobacco and also compared it with other methods. The results showed that our proposed method efficiently got the optimal feature subset and can be used to identify the growing areas of tobacco. The feature grouping divided all features into different groups according to feature importance scores measured by random forest variable importance measures. The optimal feature subset was generated by continuous groups with important features, while the groups with irrelevant features were eliminated, which degraded the difficulty of feature selection. The experimental results demonstrated that our proposed method successfully eliminated the irrelevant features and got the optimal feature subset, leading to a significant improvement on the classification accuracy.