Enhancing Weak Nodes in Decision Tree Algorithm Using Data Augmentation

IF 1.2 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
Youness Manzali, Mohamed El far, M. Chahhou, Mohammed Elmohajir
{"title":"Enhancing Weak Nodes in Decision Tree Algorithm Using Data Augmentation","authors":"Youness Manzali, Mohamed El far, M. Chahhou, Mohammed Elmohajir","doi":"10.2478/cait-2022-0016","DOIUrl":null,"url":null,"abstract":"Abstract Decision trees are among the most popular classifiers in machine learning, artificial intelligence, and pattern recognition because they are accurate and easy to interpret. During the tree construction, a node containing too few observations (weak node) could still get split, and then the resulted split is unreliable and statistically has no value. Many existing machine-learning methods can resolve this issue, such as pruning, which removes the tree’s non-meaningful parts. This paper deals with the weak nodes differently; we introduce a new algorithm Enhancing Weak Nodes in Decision Tree (EWNDT), which reinforces them by increasing their data from other similar tree nodes. We called the data augmentation a virtual merging because we temporarily recalculate the best splitting attribute and the best threshold in the weak node. We have used two approaches to defining the similarity between two nodes. The experimental results are verified using benchmark datasets from the UCI machine-learning repository. The results indicate that the EWNDT algorithm gives a good performance.","PeriodicalId":45562,"journal":{"name":"Cybernetics and Information Technologies","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cybernetics and Information Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/cait-2022-0016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

Abstract Decision trees are among the most popular classifiers in machine learning, artificial intelligence, and pattern recognition because they are accurate and easy to interpret. During the tree construction, a node containing too few observations (weak node) could still get split, and then the resulted split is unreliable and statistically has no value. Many existing machine-learning methods can resolve this issue, such as pruning, which removes the tree’s non-meaningful parts. This paper deals with the weak nodes differently; we introduce a new algorithm Enhancing Weak Nodes in Decision Tree (EWNDT), which reinforces them by increasing their data from other similar tree nodes. We called the data augmentation a virtual merging because we temporarily recalculate the best splitting attribute and the best threshold in the weak node. We have used two approaches to defining the similarity between two nodes. The experimental results are verified using benchmark datasets from the UCI machine-learning repository. The results indicate that the EWNDT algorithm gives a good performance.
利用数据增强技术增强决策树算法中的弱节点
决策树是机器学习、人工智能和模式识别中最流行的分类器之一,因为它们准确且易于解释。在树的构建过程中,观测值过少的节点(弱节点)仍然可能被分割,那么分割的结果是不可靠的,在统计上没有价值。许多现有的机器学习方法都可以解决这个问题,比如修剪,它可以去除树中没有意义的部分。本文对弱节点进行了不同的处理;本文提出了一种新的决策树弱节点增强算法(EWNDT),该算法通过增加其他类似树节点的数据来增强决策树弱节点。我们称这种数据增强为虚拟合并,因为我们临时重新计算弱节点上的最佳分割属性和最佳阈值。我们使用了两种方法来定义两个节点之间的相似性。实验结果使用来自UCI机器学习存储库的基准数据集进行验证。结果表明,EWNDT算法具有良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Cybernetics and Information Technologies
Cybernetics and Information Technologies COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
3.20
自引率
25.00%
发文量
35
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信