{"title":"Optimization of Classification Rules and Voting Strategies for Random Forest","authors":"Shishi Huang, Wanrong Gu, Shixin Chen","doi":"10.1109/CCIS53392.2021.9754599","DOIUrl":null,"url":null,"abstract":"As an efficient learning method, random forest is widely used in data mining, machine learning, artificial intelligence and other fields. It has excellent capabilities in specific practice. However, the decision tree model used in the classification process for random forest traverses all attribute values to find the split points, which leads to over-fitting and reduction of algorithm efficiency. In addition, the meta-base models of random forests vote with the same weight, which may result in decreasing algorithm accuracy. In this paper we accomplish the following two optimization tasks. Firstly, the continuous attributes are discretized based on the boundary theorem of Fayyad and Irani. Secondly, Gaussian mixture model is used to adjust the weight of the meta-base models in optimized random forest according to the similarity between the subsets and the training sets. Finally, the optimized algorithm is applied to the student information data set and the terrain types data set. The experiment results show that the optimized algorithm can effectively improve the classification efficiency and prediction accuracy.","PeriodicalId":191226,"journal":{"name":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS53392.2021.9754599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
As an efficient learning method, random forest is widely used in data mining, machine learning, artificial intelligence and other fields. It has excellent capabilities in specific practice. However, the decision tree model used in the classification process for random forest traverses all attribute values to find the split points, which leads to over-fitting and reduction of algorithm efficiency. In addition, the meta-base models of random forests vote with the same weight, which may result in decreasing algorithm accuracy. In this paper we accomplish the following two optimization tasks. Firstly, the continuous attributes are discretized based on the boundary theorem of Fayyad and Irani. Secondly, Gaussian mixture model is used to adjust the weight of the meta-base models in optimized random forest according to the similarity between the subsets and the training sets. Finally, the optimized algorithm is applied to the student information data set and the terrain types data set. The experiment results show that the optimized algorithm can effectively improve the classification efficiency and prediction accuracy.