多元共线性数据分类的Logistic回归与判别分析的比较

Q3 Mathematics

WSEAS Transactions on Mathematics Pub Date : 2023-02-16 DOI:10.37394/23206.2023.22.15

Autcha Araveeporn

{"title":"多元共线性数据分类的Logistic回归与判别分析的比较","authors":"Autcha Araveeporn","doi":"10.37394/23206.2023.22.15","DOIUrl":null,"url":null,"abstract":"The objective of this study is to concentrate on the classification method of the logistic regression and the discriminant analysis by using the simulation dataset and the liver patients as the actual data. These datasets are used the binary dependent variable depending on the correlated independent variables or called multicollinearity data. The standard classification method is logistic regression, which uses the logit function’s probability to conduct the dichotomous dependent variable. The iteration process can be solved to estimate logit function parameters and explain the relationship between a dependent binary variable and independent variables. Discriminant analysis is a powerful classification based on linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and regularized discriminant analysis (RDA). These methods consider the decision boundaries by building a classifier model on the multivariate normal distribution. LDA defines the standard covariance matrix, but QDA has an individual covariance matrix. RDA extends from QDA by setting the regularized parameter to estimate the covariance matrix. In the case of the simulation study, the independent variables are generated by defining the constant correlation on the multivariate normal distribution that made the multicollinearity problem. Then the binary response variable can be approximated from the logit function. For application to actual data, we expressed the classification of type liver and non-liver patients as the dependent variables and obtained patient personal information on the nine independent variables. The highest average percentage of accuracy determines the performance of these methods. The results have shown that the logistic regression was successful when using small independent variables, but the RDA performed when using large independent variables.","PeriodicalId":55878,"journal":{"name":"WSEAS Transactions on Mathematics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparison of Logistic Regression and Discriminant Analysis for Classification of Multicollinearity Data\",\"authors\":\"Autcha Araveeporn\",\"doi\":\"10.37394/23206.2023.22.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of this study is to concentrate on the classification method of the logistic regression and the discriminant analysis by using the simulation dataset and the liver patients as the actual data. These datasets are used the binary dependent variable depending on the correlated independent variables or called multicollinearity data. The standard classification method is logistic regression, which uses the logit function’s probability to conduct the dichotomous dependent variable. The iteration process can be solved to estimate logit function parameters and explain the relationship between a dependent binary variable and independent variables. Discriminant analysis is a powerful classification based on linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and regularized discriminant analysis (RDA). These methods consider the decision boundaries by building a classifier model on the multivariate normal distribution. LDA defines the standard covariance matrix, but QDA has an individual covariance matrix. RDA extends from QDA by setting the regularized parameter to estimate the covariance matrix. In the case of the simulation study, the independent variables are generated by defining the constant correlation on the multivariate normal distribution that made the multicollinearity problem. Then the binary response variable can be approximated from the logit function. For application to actual data, we expressed the classification of type liver and non-liver patients as the dependent variables and obtained patient personal information on the nine independent variables. The highest average percentage of accuracy determines the performance of these methods. The results have shown that the logistic regression was successful when using small independent variables, but the RDA performed when using large independent variables.\",\"PeriodicalId\":55878,\"journal\":{\"name\":\"WSEAS Transactions on Mathematics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"WSEAS Transactions on Mathematics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.37394/23206.2023.22.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"WSEAS Transactions on Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37394/23206.2023.22.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

摘要

本研究的目的是以模拟数据集为基础，以肝脏患者为实际数据，重点研究logistic回归和判别分析的分类方法。这些数据集使用二进制因变量，取决于相关的自变量或称为多重共线性数据。标准的分类方法是逻辑回归，它利用logit函数的概率来进行二分类因变量。求解迭代过程，估计logit函数参数，解释因变量与自变量之间的关系。判别分析是建立在线性判别分析(LDA)、二次判别分析(QDA)和正则化判别分析(RDA)基础上的一种强大的分类方法。这些方法通过在多元正态分布上建立分类器模型来考虑决策边界。LDA定义标准协方差矩阵，但QDA有单独的协方差矩阵。RDA是QDA的扩展，通过设置正则化参数来估计协方差矩阵。在模拟研究中，自变量是通过定义多元正态分布上的常数相关性来产生的，这使得多重共线性问题。然后可以用logit函数来近似二元响应变量。为了应用于实际数据，我们将肝型和非肝型患者的分类表示为因变量，并在9个自变量上获得患者个人信息。最高的平均准确度百分比决定了这些方法的性能。结果表明，当使用小自变量时，逻辑回归是成功的，但当使用大自变量时，RDA表现良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparison of Logistic Regression and Discriminant Analysis for Classification of Multicollinearity Data

The objective of this study is to concentrate on the classification method of the logistic regression and the discriminant analysis by using the simulation dataset and the liver patients as the actual data. These datasets are used the binary dependent variable depending on the correlated independent variables or called multicollinearity data. The standard classification method is logistic regression, which uses the logit function’s probability to conduct the dichotomous dependent variable. The iteration process can be solved to estimate logit function parameters and explain the relationship between a dependent binary variable and independent variables. Discriminant analysis is a powerful classification based on linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and regularized discriminant analysis (RDA). These methods consider the decision boundaries by building a classifier model on the multivariate normal distribution. LDA defines the standard covariance matrix, but QDA has an individual covariance matrix. RDA extends from QDA by setting the regularized parameter to estimate the covariance matrix. In the case of the simulation study, the independent variables are generated by defining the constant correlation on the multivariate normal distribution that made the multicollinearity problem. Then the binary response variable can be approximated from the logit function. For application to actual data, we expressed the classification of type liver and non-liver patients as the dependent variables and obtained patient personal information on the nine independent variables. The highest average percentage of accuracy determines the performance of these methods. The results have shown that the logistic regression was successful when using small independent variables, but the RDA performed when using large independent variables.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

WSEAS Transactions on Mathematics Mathematics-Discrete Mathematics and Combinatorics

CiteScore

1.30

自引率

0.00%

发文量

期刊介绍： WSEAS Transactions on Mathematics publishes original research papers relating to applied and theoretical mathematics. We aim to bring important work to a wide international audience and therefore only publish papers of exceptional scientific value that advance our understanding of these particular areas. The research presented must transcend the limits of case studies, while both experimental and theoretical studies are accepted. It is a multi-disciplinary journal and therefore its content mirrors the diverse interests and approaches of scholars involved with linear algebra, numerical analysis, differential equations, statistics and related areas. We also welcome scholarly contributions from officials with government agencies, international agencies, and non-governmental organizations.