Optimization of Classification Rules and Voting Strategies for Random Forest

Shishi Huang, Wanrong Gu, Shixin Chen
{"title":"Optimization of Classification Rules and Voting Strategies for Random Forest","authors":"Shishi Huang, Wanrong Gu, Shixin Chen","doi":"10.1109/CCIS53392.2021.9754599","DOIUrl":null,"url":null,"abstract":"As an efficient learning method, random forest is widely used in data mining, machine learning, artificial intelligence and other fields. It has excellent capabilities in specific practice. However, the decision tree model used in the classification process for random forest traverses all attribute values to find the split points, which leads to over-fitting and reduction of algorithm efficiency. In addition, the meta-base models of random forests vote with the same weight, which may result in decreasing algorithm accuracy. In this paper we accomplish the following two optimization tasks. Firstly, the continuous attributes are discretized based on the boundary theorem of Fayyad and Irani. Secondly, Gaussian mixture model is used to adjust the weight of the meta-base models in optimized random forest according to the similarity between the subsets and the training sets. Finally, the optimized algorithm is applied to the student information data set and the terrain types data set. The experiment results show that the optimized algorithm can effectively improve the classification efficiency and prediction accuracy.","PeriodicalId":191226,"journal":{"name":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS53392.2021.9754599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As an efficient learning method, random forest is widely used in data mining, machine learning, artificial intelligence and other fields. It has excellent capabilities in specific practice. However, the decision tree model used in the classification process for random forest traverses all attribute values to find the split points, which leads to over-fitting and reduction of algorithm efficiency. In addition, the meta-base models of random forests vote with the same weight, which may result in decreasing algorithm accuracy. In this paper we accomplish the following two optimization tasks. Firstly, the continuous attributes are discretized based on the boundary theorem of Fayyad and Irani. Secondly, Gaussian mixture model is used to adjust the weight of the meta-base models in optimized random forest according to the similarity between the subsets and the training sets. Finally, the optimized algorithm is applied to the student information data set and the terrain types data set. The experiment results show that the optimized algorithm can effectively improve the classification efficiency and prediction accuracy.
随机森林分类规则和投票策略的优化
随机森林作为一种高效的学习方法,被广泛应用于数据挖掘、机器学习、人工智能等领域。在具体实践中具有出色的能力。然而,随机森林分类过程中使用的决策树模型遍历所有属性值来寻找分裂点,导致过拟合,降低了算法效率。此外,随机森林的元基模型具有相同的权重,这可能会导致算法的准确性降低。本文主要完成了以下两项优化任务。首先,基于Fayyad和Irani的边界定理对连续属性进行离散化;其次,根据子集与训练集的相似度,利用高斯混合模型调整优化后随机森林中元基模型的权重;最后,将优化算法应用于学生信息数据集和地形类型数据集。实验结果表明,优化后的算法能有效提高分类效率和预测精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信