平衡数据分类建模错误率预测方法与分类回归树(CART)方法的比较

UNP Journal of Statistics and Data Science Pub Date : 2023-08-28 DOI:10.24036/ujsds/vol1-iss4/73

None Fitria Panca Ramadhani, None Dodi Vionanda, None Syafriandi Syafriandi, None Admi Salma

{"title":"平衡数据分类建模错误率预测方法与分类回归树(CART)方法的比较","authors":"None Fitria Panca Ramadhani, None Dodi Vionanda, None Syafriandi Syafriandi, None Admi Salma","doi":"10.24036/ujsds/vol1-iss4/73","DOIUrl":null,"url":null,"abstract":"CART (Classification and Regression Tree) is one of the classification algorithms in the decision tree method. The model formed in CART is a tree consisting of root nodes, internal nodes, and terminal nodes. After the model is formed, it is necessary to calculate the accuracy of the model. The aims is to see the performance of the model. The accuracy of this model can be done by calculating the predicted error rate in the model. The error rate prediction method works by dividing the data into training data and testing data. There are three methods in the error rate prediction method, such as Leave One Out Cross Validation (LOOCV), Hold Out (HO), and K-Fold Cross Validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Therefore, a comparison was made for the three error rate prediction methods with the aim of determining the appropriate method for the CART algorithm. This comparison was made by considering several factors, for instance variations in the mean, number of variables, and correlations in normal distributed random data. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. The results of this study indicate that the K-Fold Cross Validation has the median error rate and the lowest variance, so the most suitable error prediction method used for the CART method is the K-Fold Cross Validation method.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Error Rate Prediction Methods in Classification Modeling with Classification and Regression Tree (CART) Methods for Balanced Data\",\"authors\":\"None Fitria Panca Ramadhani, None Dodi Vionanda, None Syafriandi Syafriandi, None Admi Salma\",\"doi\":\"10.24036/ujsds/vol1-iss4/73\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"CART (Classification and Regression Tree) is one of the classification algorithms in the decision tree method. The model formed in CART is a tree consisting of root nodes, internal nodes, and terminal nodes. After the model is formed, it is necessary to calculate the accuracy of the model. The aims is to see the performance of the model. The accuracy of this model can be done by calculating the predicted error rate in the model. The error rate prediction method works by dividing the data into training data and testing data. There are three methods in the error rate prediction method, such as Leave One Out Cross Validation (LOOCV), Hold Out (HO), and K-Fold Cross Validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Therefore, a comparison was made for the three error rate prediction methods with the aim of determining the appropriate method for the CART algorithm. This comparison was made by considering several factors, for instance variations in the mean, number of variables, and correlations in normal distributed random data. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. The results of this study indicate that the K-Fold Cross Validation has the median error rate and the lowest variance, so the most suitable error prediction method used for the CART method is the K-Fold Cross Validation method.\",\"PeriodicalId\":220933,\"journal\":{\"name\":\"UNP Journal of Statistics and Data Science\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"UNP Journal of Statistics and Data Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.24036/ujsds/vol1-iss4/73\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"UNP Journal of Statistics and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24036/ujsds/vol1-iss4/73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

CART (Classification and Regression Tree)是决策树方法中的一种分类算法。CART中形成的模型是一个由根节点、内部节点和终端节点组成的树。模型形成后，需要计算模型的精度。目的是查看模型的性能。通过计算模型中的预测错误率来确定模型的准确性。错误率预测方法通过将数据分为训练数据和测试数据来实现。错误率预测方法有三种方法，分别是留一交叉验证(LOOCV)、留一交叉验证(HO)和K-Fold交叉验证。这些方法在将数据划分为训练数据和测试数据方面表现不同，因此每种方法都有各自的优缺点。因此，对三种错误率预测方法进行比较，以确定CART算法的合适方法。这种比较是通过考虑几个因素来进行的，例如平均值的变化、变量的数量和正态分布随机数据中的相关性。比较的结果将通过观察中位数错误率和最低方差来使用箱线图观察。本研究结果表明，K-Fold交叉验证具有中位数错误率和最低方差，因此最适合用于CART方法的误差预测方法是K-Fold交叉验证方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of Error Rate Prediction Methods in Classification Modeling with Classification and Regression Tree (CART) Methods for Balanced Data

CART (Classification and Regression Tree) is one of the classification algorithms in the decision tree method. The model formed in CART is a tree consisting of root nodes, internal nodes, and terminal nodes. After the model is formed, it is necessary to calculate the accuracy of the model. The aims is to see the performance of the model. The accuracy of this model can be done by calculating the predicted error rate in the model. The error rate prediction method works by dividing the data into training data and testing data. There are three methods in the error rate prediction method, such as Leave One Out Cross Validation (LOOCV), Hold Out (HO), and K-Fold Cross Validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Therefore, a comparison was made for the three error rate prediction methods with the aim of determining the appropriate method for the CART algorithm. This comparison was made by considering several factors, for instance variations in the mean, number of variables, and correlations in normal distributed random data. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. The results of this study indicate that the K-Fold Cross Validation has the median error rate and the lowest variance, so the most suitable error prediction method used for the CART method is the K-Fold Cross Validation method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

UNP Journal of Statistics and Data Science

自引率

0.00%

发文量