Statistical analysis of various splitting criteria for decision trees

IF 0.8 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Fadwa Aaboub, Hasna Chamlal, Tayeb Ouaderhman
{"title":"Statistical analysis of various splitting criteria for decision trees","authors":"Fadwa Aaboub, Hasna Chamlal, Tayeb Ouaderhman","doi":"10.1177/17483026231198181","DOIUrl":null,"url":null,"abstract":"Decision trees are frequently used to overcome classification problems in the fields of data mining and machine learning, owing to their many perks, including their clear and simple architecture, excellent quality, and resilience. Various decision tree algorithms are developed using a variety of attribute selection criteria, following the top-down partitioning strategy. However, their effectiveness is influenced by the choice of the splitting method. Therefore, in this work, six decision tree algorithms that are based on six different attribute evaluation metrics are gathered in order to compare their performances. The choice of the decision trees that will be compared is done based on four different categories of the splitting criteria that are criteria based on information theory, criteria based on distance, statistical-based criteria, and other splitting criteria. These approaches include iterative dichotomizer 3 (first category), C[Formula: see text] (first category), classification and regression trees (second category), Pearson’s correlation coefficient based decision tree (third category), dispersion ratio (third category), and feature weight based decision tree algorithm (last category). On eleven data sets, the six procedures are assessed in terms of classification accuracy, tree depth, leaf nodes, and tree construction time. Furthermore, the Friedman and post hoc Nemenyi tests are used to examine the results that were obtained. The results of these two tests indicate that the iterative dichotomizer 3 and classification and regression trees decision tree methods perform better than the other decision tree methodologies.","PeriodicalId":45079,"journal":{"name":"Journal of Algorithms & Computational Technology","volume":"112 1","pages":"0"},"PeriodicalIF":0.8000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Algorithms & Computational Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/17483026231198181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Decision trees are frequently used to overcome classification problems in the fields of data mining and machine learning, owing to their many perks, including their clear and simple architecture, excellent quality, and resilience. Various decision tree algorithms are developed using a variety of attribute selection criteria, following the top-down partitioning strategy. However, their effectiveness is influenced by the choice of the splitting method. Therefore, in this work, six decision tree algorithms that are based on six different attribute evaluation metrics are gathered in order to compare their performances. The choice of the decision trees that will be compared is done based on four different categories of the splitting criteria that are criteria based on information theory, criteria based on distance, statistical-based criteria, and other splitting criteria. These approaches include iterative dichotomizer 3 (first category), C[Formula: see text] (first category), classification and regression trees (second category), Pearson’s correlation coefficient based decision tree (third category), dispersion ratio (third category), and feature weight based decision tree algorithm (last category). On eleven data sets, the six procedures are assessed in terms of classification accuracy, tree depth, leaf nodes, and tree construction time. Furthermore, the Friedman and post hoc Nemenyi tests are used to examine the results that were obtained. The results of these two tests indicate that the iterative dichotomizer 3 and classification and regression trees decision tree methods perform better than the other decision tree methodologies.
决策树各种分裂准则的统计分析
决策树由于其清晰简单的体系结构、优良的质量和弹性等优点,经常被用于克服数据挖掘和机器学习领域的分类问题。根据自顶向下的划分策略,使用各种属性选择标准开发了各种决策树算法。但是,分割方法的选择会影响其有效性。因此,在这项工作中,为了比较它们的性能,我们收集了基于六种不同属性评价指标的六种决策树算法。要比较的决策树的选择是基于四种不同类别的分割标准完成的,这四种标准是基于信息论的标准、基于距离的标准、基于统计的标准和其他分割标准。这些方法包括迭代二分类器3(第一类)、C[公式:见文本](第一类)、分类和回归树(第二类)、基于Pearson相关系数的决策树(第三类)、离散比(第三类)和基于特征权重的决策树算法(最后一类)。在11个数据集上,从分类精度、树深度、叶节点和树构建时间等方面对这6种方法进行了评估。此外,弗里德曼和事后Nemenyi测试被用来检查所获得的结果。这两个测试的结果表明,迭代二分类器3和分类回归树决策树方法的性能优于其他决策树方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Algorithms & Computational Technology
Journal of Algorithms & Computational Technology COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
1.70
自引率
0.00%
发文量
8
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信