IMPLEMENTASI ALGORITMA DECISION TREE C4.5 DENGAN IMPROVISASI MEAN DAN MEDIAN PADA DATASET NUMERIK

Neni Febiani, Abdullah Fauzan, Muhamat Maariful Huda
{"title":"IMPLEMENTASI ALGORITMA DECISION TREE C4.5 DENGAN IMPROVISASI MEAN DAN MEDIAN PADA DATASET NUMERIK","authors":"Neni Febiani, Abdullah Fauzan, Muhamat Maariful Huda","doi":"10.37600/tekinkom.v5i1.435","DOIUrl":null,"url":null,"abstract":"The decision tree is a method of classifying data mining. The decision tree has one type of algorithm model, namely the C4.5 algorithm. The C4.5 decision tree algorithm is easy to understand because it has a tree-like structure in general. The C4.5 algorithm in handling quantitative data is often less efficient and effective. Based on these problems, this study improvised the numerical attribute dataset using the mean and median in the preprocessing of the data. The improvisation is used to obtain a threshold value, thereby minimizing information loss and time complexity when implementing the C4.5 decision tree in predicting training data. Evaluation of the system used in this study using a confusion matrix. The confusion matrix is ​​used as a benchmark in testing the classification method using data testing. In this study, the dataset was partitioned into three scenarios. In scenario 1 with 70% training data and 20% test data, the highest accuracy is 75%. The improvisation of the mean and median on the numerical attributes in the C4.5 algorithm can be used in this scenario.","PeriodicalId":365934,"journal":{"name":"Jurnal Teknik Informasi dan Komputer (Tekinkom)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknik Informasi dan Komputer (Tekinkom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37600/tekinkom.v5i1.435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The decision tree is a method of classifying data mining. The decision tree has one type of algorithm model, namely the C4.5 algorithm. The C4.5 decision tree algorithm is easy to understand because it has a tree-like structure in general. The C4.5 algorithm in handling quantitative data is often less efficient and effective. Based on these problems, this study improvised the numerical attribute dataset using the mean and median in the preprocessing of the data. The improvisation is used to obtain a threshold value, thereby minimizing information loss and time complexity when implementing the C4.5 decision tree in predicting training data. Evaluation of the system used in this study using a confusion matrix. The confusion matrix is ​​used as a benchmark in testing the classification method using data testing. In this study, the dataset was partitioned into three scenarios. In scenario 1 with 70% training data and 20% test data, the highest accuracy is 75%. The improvisation of the mean and median on the numerical attributes in the C4.5 algorithm can be used in this scenario.
决策树是一种分类数据挖掘的方法。决策树有一种算法模型,即C4.5算法。C4.5决策树算法总体上具有树状结构,易于理解。C4.5算法在处理定量数据时往往效率较低。针对这些问题,本研究在数据预处理中采用均值和中值方法构建了数值属性数据集。在实现C4.5决策树预测训练数据时,利用临场发挥获得一个阈值,从而使信息损失和时间复杂度最小化。本研究使用混淆矩阵对系统进行评估。将混淆矩阵作为基准,通过数据测试对分类方法进行测试。在本研究中,数据集被划分为三个场景。在场景1中,训练数据占70%,测试数据占20%,最高准确率为75%。C4.5算法中对数值属性的均值和中值的即兴化可以用于此场景。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信