{"title":"多元共线性数据分类的Logistic回归与判别分析的比较","authors":"Autcha Araveeporn","doi":"10.37394/23206.2023.22.15","DOIUrl":null,"url":null,"abstract":"The objective of this study is to concentrate on the classification method of the logistic regression and the discriminant analysis by using the simulation dataset and the liver patients as the actual data. These datasets are used the binary dependent variable depending on the correlated independent variables or called multicollinearity data. The standard classification method is logistic regression, which uses the logit function’s probability to conduct the dichotomous dependent variable. The iteration process can be solved to estimate logit function parameters and explain the relationship between a dependent binary variable and independent variables. Discriminant analysis is a powerful classification based on linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and regularized discriminant analysis (RDA). These methods consider the decision boundaries by building a classifier model on the multivariate normal distribution. LDA defines the standard covariance matrix, but QDA has an individual covariance matrix. RDA extends from QDA by setting the regularized parameter to estimate the covariance matrix. In the case of the simulation study, the independent variables are generated by defining the constant correlation on the multivariate normal distribution that made the multicollinearity problem. Then the binary response variable can be approximated from the logit function. For application to actual data, we expressed the classification of type liver and non-liver patients as the dependent variables and obtained patient personal information on the nine independent variables. The highest average percentage of accuracy determines the performance of these methods. The results have shown that the logistic regression was successful when using small independent variables, but the RDA performed when using large independent variables.","PeriodicalId":55878,"journal":{"name":"WSEAS Transactions on Mathematics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Logistic Regression and Discriminant Analysis for Classification of Multicollinearity Data\",\"authors\":\"Autcha Araveeporn\",\"doi\":\"10.37394/23206.2023.22.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of this study is to concentrate on the classification method of the logistic regression and the discriminant analysis by using the simulation dataset and the liver patients as the actual data. These datasets are used the binary dependent variable depending on the correlated independent variables or called multicollinearity data. The standard classification method is logistic regression, which uses the logit function’s probability to conduct the dichotomous dependent variable. The iteration process can be solved to estimate logit function parameters and explain the relationship between a dependent binary variable and independent variables. Discriminant analysis is a powerful classification based on linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and regularized discriminant analysis (RDA). These methods consider the decision boundaries by building a classifier model on the multivariate normal distribution. LDA defines the standard covariance matrix, but QDA has an individual covariance matrix. RDA extends from QDA by setting the regularized parameter to estimate the covariance matrix. In the case of the simulation study, the independent variables are generated by defining the constant correlation on the multivariate normal distribution that made the multicollinearity problem. Then the binary response variable can be approximated from the logit function. For application to actual data, we expressed the classification of type liver and non-liver patients as the dependent variables and obtained patient personal information on the nine independent variables. The highest average percentage of accuracy determines the performance of these methods. The results have shown that the logistic regression was successful when using small independent variables, but the RDA performed when using large independent variables.\",\"PeriodicalId\":55878,\"journal\":{\"name\":\"WSEAS Transactions on Mathematics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"WSEAS Transactions on Mathematics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37394/23206.2023.22.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"WSEAS Transactions on Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37394/23206.2023.22.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
Comparison of Logistic Regression and Discriminant Analysis for Classification of Multicollinearity Data
The objective of this study is to concentrate on the classification method of the logistic regression and the discriminant analysis by using the simulation dataset and the liver patients as the actual data. These datasets are used the binary dependent variable depending on the correlated independent variables or called multicollinearity data. The standard classification method is logistic regression, which uses the logit function’s probability to conduct the dichotomous dependent variable. The iteration process can be solved to estimate logit function parameters and explain the relationship between a dependent binary variable and independent variables. Discriminant analysis is a powerful classification based on linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and regularized discriminant analysis (RDA). These methods consider the decision boundaries by building a classifier model on the multivariate normal distribution. LDA defines the standard covariance matrix, but QDA has an individual covariance matrix. RDA extends from QDA by setting the regularized parameter to estimate the covariance matrix. In the case of the simulation study, the independent variables are generated by defining the constant correlation on the multivariate normal distribution that made the multicollinearity problem. Then the binary response variable can be approximated from the logit function. For application to actual data, we expressed the classification of type liver and non-liver patients as the dependent variables and obtained patient personal information on the nine independent variables. The highest average percentage of accuracy determines the performance of these methods. The results have shown that the logistic regression was successful when using small independent variables, but the RDA performed when using large independent variables.
期刊介绍:
WSEAS Transactions on Mathematics publishes original research papers relating to applied and theoretical mathematics. We aim to bring important work to a wide international audience and therefore only publish papers of exceptional scientific value that advance our understanding of these particular areas. The research presented must transcend the limits of case studies, while both experimental and theoretical studies are accepted. It is a multi-disciplinary journal and therefore its content mirrors the diverse interests and approaches of scholars involved with linear algebra, numerical analysis, differential equations, statistics and related areas. We also welcome scholarly contributions from officials with government agencies, international agencies, and non-governmental organizations.