Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix

Jurnal Sisfokom (Sistem Informasi dan Komputer) Pub Date : 2024-02-15 DOI:10.32736/sisfokom.v13i1.1985

Ellya Helmud, Fitriyani Fitriyani, Parlia Romadiana

{"title":"Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix","authors":"Ellya Helmud, Fitriyani Fitriyani, Parlia Romadiana","doi":"10.32736/sisfokom.v13i1.1985","DOIUrl":null,"url":null,"abstract":"The classification method is part of data mining which is used to predict existing problems and also as predictions for the future. The form of dataset used in the classification method is supervised data. The random forest classification method is processed by forming several decision trees and then combining them to get better and more precise predictions. while a decision tree is the concept of changing a pile of data into a decision tree that presents the rules of a decision. From these two classification methods, researchers will compare the level of accuracy of predictions from both methods with the same dataset, namely the employee dataset in India, to predict the level of accuracy of employees who leave their jobs or still remain to work at their company. The number of records available is 4654 records. Of the existing data, 90% was used as training data and 10% was used as test data. From the results of testing this method, it was found that the accuracy level of the random forest method was 86.45%, while the decision tree method was 84.30% accuracy level. Then, by using the confusion matrix, you can see the magnitude of the distribution of experimental validity visually to calculate precision, recall and F1-Score. The random forest algorithm obtained precision of: 96.7%, sensitivity of: 84.7%, specificity of: 91.4%, and F1-Score of: 90.2%. Meanwhile, the decision tree algorithm obtained precision of: 95.7%, sensitivity of: 82.9%, specificity of: 88.4%, and F1-Score of: 88.8%.","PeriodicalId":517030,"journal":{"name":"Jurnal Sisfokom (Sistem Informasi dan Komputer)","volume":"119 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Sisfokom (Sistem Informasi dan Komputer)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32736/sisfokom.v13i1.1985","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The classification method is part of data mining which is used to predict existing problems and also as predictions for the future. The form of dataset used in the classification method is supervised data. The random forest classification method is processed by forming several decision trees and then combining them to get better and more precise predictions. while a decision tree is the concept of changing a pile of data into a decision tree that presents the rules of a decision. From these two classification methods, researchers will compare the level of accuracy of predictions from both methods with the same dataset, namely the employee dataset in India, to predict the level of accuracy of employees who leave their jobs or still remain to work at their company. The number of records available is 4654 records. Of the existing data, 90% was used as training data and 10% was used as test data. From the results of testing this method, it was found that the accuracy level of the random forest method was 86.45%, while the decision tree method was 84.30% accuracy level. Then, by using the confusion matrix, you can see the magnitude of the distribution of experimental validity visually to calculate precision, recall and F1-Score. The random forest algorithm obtained precision of: 96.7%, sensitivity of: 84.7%, specificity of: 91.4%, and F1-Score of: 90.2%. Meanwhile, the decision tree algorithm obtained precision of: 95.7%, sensitivity of: 82.9%, specificity of: 88.4%, and F1-Score of: 88.8%.

查看原文本刊更多论文

使用混淆矩阵比较监督机器学习随机森林算法和决策树算法的分类性能

分类方法是数据挖掘的一部分，用于预测现有问题和未来。分类法使用的数据集形式是监督数据。随机森林分类法的处理方法是形成几棵决策树，然后将它们组合起来，以获得更好、更精确的预测。根据这两种分类方法，研究人员将用相同的数据集（即印度员工数据集）比较两种方法预测的准确度，以预测员工离职或仍留在公司工作的准确度。现有记录数量为 4654 条。在现有数据中，90% 用作训练数据，10% 用作测试数据。测试结果显示，随机森林法的准确率为 86.45%，而决策树法的准确率为 84.30%。然后，通过混淆矩阵，可以直观地看到实验有效性分布的大小，从而计算出精确度、召回率和 F1-Score。随机森林算法获得的精确度为96.7%，灵敏度84.7%，特异性91.4%，F1-Score：90.2%：90.2%.与此同时，决策树算法的精确度为95.7%，灵敏度82.9%，特异性88.4%，F1-Score：88.8%：88.8%.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jurnal Sisfokom (Sistem Informasi dan Komputer)

自引率

0.00%

发文量