Comparison of Error Prediction Methods in Claassification Modeling with CHAID Methods for Balanced Data

Findri Wara Putri, Dodi Vionanda, Atus Amadi putra, Fadhilah Fitri
{"title":"Comparison of Error Prediction Methods in Claassification Modeling with CHAID Methods for Balanced Data","authors":"Findri Wara Putri, Dodi Vionanda, Atus Amadi putra, Fadhilah Fitri","doi":"10.24036/ujsds/vol1-iss5/116","DOIUrl":null,"url":null,"abstract":"Chi-Squared Automatic Interaction Detection (CHAID) is an exploratory method for classifying data by building classification trees. The classification result are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The goal is to see the performance of the model. The accuracy of this model can be determined by calculating the level of prediction error in the model. The error rate prediction method works by dividing data into training data and testing data. There are three methods in the error rate prediction method, such as Leave one out cross validation (LOOCV), Hold out, and k-fold cross validation. These methods have different performance in dividing data into training data and test data, so that each method has advantages and disadvantages. Therefore, a comparison of the three error rate prediction methods was carried out with the aim of determining the appropriate method for the CHAID. This research is included in experimental research and uses simulation data from data generation results in RStudio. This comparison is carried out by considering several factors, namely the marginal probability matrix and different correlations. The comparison results will be observed using a boxplot by looking at the median error rate and lowest variance. This research found that k-fold cross validation is the most suitable error rate prediction method applied to the CHAID method for balanced data.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"UNP Journal of Statistics and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24036/ujsds/vol1-iss5/116","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Chi-Squared Automatic Interaction Detection (CHAID) is an exploratory method for classifying data by building classification trees. The classification result are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The goal is to see the performance of the model. The accuracy of this model can be determined by calculating the level of prediction error in the model. The error rate prediction method works by dividing data into training data and testing data. There are three methods in the error rate prediction method, such as Leave one out cross validation (LOOCV), Hold out, and k-fold cross validation. These methods have different performance in dividing data into training data and test data, so that each method has advantages and disadvantages. Therefore, a comparison of the three error rate prediction methods was carried out with the aim of determining the appropriate method for the CHAID. This research is included in experimental research and uses simulation data from data generation results in RStudio. This comparison is carried out by considering several factors, namely the marginal probability matrix and different correlations. The comparison results will be observed using a boxplot by looking at the median error rate and lowest variance. This research found that k-fold cross validation is the most suitable error rate prediction method applied to the CHAID method for balanced data.
针对平衡数据的 Claassification 建模中的误差预测方法与 CHAID 方法的比较
Chi-Squared 自动交互检测(CHAID)是一种通过构建分类树对数据进行分类的探索性方法。分类结果以树状图模型的形式显示。模型形成后,有必要计算模型的准确性。目的是了解模型的性能。该模型的准确性可以通过计算模型的预测误差水平来确定。误差率预测法的工作原理是将数据分为训练数据和测试数据。误差率预测法有三种方法,如剔除交叉验证(LOOCV)、保留和 k-fold 交叉验证。这些方法在将数据分为训练数据和测试数据时具有不同的性能,因此每种方法都各有优缺点。因此,对三种误差率预测方法进行了比较,目的是确定适合 CHAID 的方法。本研究包含在实验研究中,并使用 RStudio 中数据生成结果的模拟数据。这种比较是通过考虑几个因素进行的,即边际概率矩阵和不同的相关性。比较结果将通过观察误差率中位数和最小方差,使用方框图进行观察。本研究发现,k-fold 交叉验证是最适合用于平衡数据 CHAID 方法的错误率预测方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信