Optimization of Classification Rules and Voting Strategies for Random Forest

2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS) Pub Date : 2021-11-07 DOI:10.1109/CCIS53392.2021.9754599

Shishi Huang, Wanrong Gu, Shixin Chen

{"title":"Optimization of Classification Rules and Voting Strategies for Random Forest","authors":"Shishi Huang, Wanrong Gu, Shixin Chen","doi":"10.1109/CCIS53392.2021.9754599","DOIUrl":null,"url":null,"abstract":"As an efficient learning method, random forest is widely used in data mining, machine learning, artificial intelligence and other fields. It has excellent capabilities in specific practice. However, the decision tree model used in the classification process for random forest traverses all attribute values to find the split points, which leads to over-fitting and reduction of algorithm efficiency. In addition, the meta-base models of random forests vote with the same weight, which may result in decreasing algorithm accuracy. In this paper we accomplish the following two optimization tasks. Firstly, the continuous attributes are discretized based on the boundary theorem of Fayyad and Irani. Secondly, Gaussian mixture model is used to adjust the weight of the meta-base models in optimized random forest according to the similarity between the subsets and the training sets. Finally, the optimized algorithm is applied to the student information data set and the terrain types data set. The experiment results show that the optimized algorithm can effectively improve the classification efficiency and prediction accuracy.","PeriodicalId":191226,"journal":{"name":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS53392.2021.9754599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

As an efficient learning method, random forest is widely used in data mining, machine learning, artificial intelligence and other fields. It has excellent capabilities in specific practice. However, the decision tree model used in the classification process for random forest traverses all attribute values to find the split points, which leads to over-fitting and reduction of algorithm efficiency. In addition, the meta-base models of random forests vote with the same weight, which may result in decreasing algorithm accuracy. In this paper we accomplish the following two optimization tasks. Firstly, the continuous attributes are discretized based on the boundary theorem of Fayyad and Irani. Secondly, Gaussian mixture model is used to adjust the weight of the meta-base models in optimized random forest according to the similarity between the subsets and the training sets. Finally, the optimized algorithm is applied to the student information data set and the terrain types data set. The experiment results show that the optimized algorithm can effectively improve the classification efficiency and prediction accuracy.

查看原文本刊更多论文

随机森林分类规则和投票策略的优化

随机森林作为一种高效的学习方法，被广泛应用于数据挖掘、机器学习、人工智能等领域。在具体实践中具有出色的能力。然而，随机森林分类过程中使用的决策树模型遍历所有属性值来寻找分裂点，导致过拟合，降低了算法效率。此外，随机森林的元基模型具有相同的权重，这可能会导致算法的准确性降低。本文主要完成了以下两项优化任务。首先，基于Fayyad和Irani的边界定理对连续属性进行离散化;其次，根据子集与训练集的相似度，利用高斯混合模型调整优化后随机森林中元基模型的权重;最后，将优化算法应用于学生信息数据集和地形类型数据集。实验结果表明，优化后的算法能有效提高分类效率和预测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)

自引率

0.00%

发文量