{"title":"Comparison of Error Rate Prediction Methods in Binary Logistic Regression Model for Balanced Data","authors":"None Shavira Asysyifa S, None Dodi Vionanda, None Nonong Amalita, None Dina Fitria","doi":"10.24036/ujsds/vol1-iss4/90","DOIUrl":null,"url":null,"abstract":"Binary Logistic Regression is one of the statistical methods that can be used to see the relations between dependent variable with some independent variables, where the dependent variable split into two categories, namely the category declaring a successful event and the category declaring a failed event. The performance of binary logistic regression can be seen from the accurary of the model. Accuracy can be measured by predicting the error rate. One method that can be used to predict error rate is cross validation. The cross validation method works by dividing the data into two parts, namely testing data and training data. Cross validation has several learning methods that are commonly used, namely Leave One Out (LOO), Hold out, and K-fold cross validation. LOO has unbiased estimation of accuracy but take a long time, hold out can avoid overfitting and works faster because no iterations, and k-fold cross validation has smaller error rate prediction. Meanwhile, data cases with different correlation are useful to find out the different correlations effect performance of error rate prediction method. In this study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combination of mean differences and correlation. Considering these factors, this study focuses on comparing the three cross validation methods for predicting error rate prediction in binary logistic regression. This study finds out that k-fold cross validation method is the most suitable method to predict errors in logistic regression modeling for balanced data.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"UNP Journal of Statistics and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24036/ujsds/vol1-iss4/90","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Binary Logistic Regression is one of the statistical methods that can be used to see the relations between dependent variable with some independent variables, where the dependent variable split into two categories, namely the category declaring a successful event and the category declaring a failed event. The performance of binary logistic regression can be seen from the accurary of the model. Accuracy can be measured by predicting the error rate. One method that can be used to predict error rate is cross validation. The cross validation method works by dividing the data into two parts, namely testing data and training data. Cross validation has several learning methods that are commonly used, namely Leave One Out (LOO), Hold out, and K-fold cross validation. LOO has unbiased estimation of accuracy but take a long time, hold out can avoid overfitting and works faster because no iterations, and k-fold cross validation has smaller error rate prediction. Meanwhile, data cases with different correlation are useful to find out the different correlations effect performance of error rate prediction method. In this study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combination of mean differences and correlation. Considering these factors, this study focuses on comparing the three cross validation methods for predicting error rate prediction in binary logistic regression. This study finds out that k-fold cross validation method is the most suitable method to predict errors in logistic regression modeling for balanced data.
二元逻辑回归是一种统计方法,可以用来观察因变量与一些自变量之间的关系,其中因变量分为两类,即声明成功事件的类别和声明失败事件的类别。从模型的准确性可以看出二元逻辑回归的性能。准确度可以通过预测错误率来衡量。一个可以用来预测错误率的方法是交叉验证。交叉验证法的工作原理是将数据分为两部分,即测试数据和训练数据。交叉验证有几种常用的学习方法,即Leave One Out (LOO)、Hold Out和K-fold交叉验证。LOO具有无偏估计的准确性,但需要很长时间,hold out可以避免过拟合并且工作速度更快,因为没有迭代,k-fold交叉验证具有较小的错误率预测。同时,不同相关性的数据案例有助于发现不同相关性对错误率预测方法性能的影响。本研究使用人工生成的正态分布数据,包括单变量、双变量和多变量数据集,这些数据集具有不同的均值差异和相关性组合。考虑到这些因素,本研究的重点是比较三种交叉验证方法在二元逻辑回归中预测错误率的预测。本研究发现,k-fold交叉验证方法是最适合预测平衡数据逻辑回归建模误差的方法。