A comparative study between shrinkage methods (ridge-lasso) using simulation

Q1 Engineering
Z. Ghareeb, Suhad Ali Shaheed Al-Temimi
{"title":"A comparative study between shrinkage methods (ridge-lasso) using simulation","authors":"Z. Ghareeb, Suhad Ali Shaheed Al-Temimi","doi":"10.21533/pen.v11i2.3472","DOIUrl":null,"url":null,"abstract":"The general linear model is widely used in many scientific fields, especially biological ones. The Ordinary Least Squares (OLS) estimators for the coefficients of the general linear model are characterized by good specifications symbolized by the acronym BLUE (Best Linear Unbiased Estimator), provided that the basic assumptions for building the model under study are met. The failure to achieve one of the basic assumptions or hypotheses required to build the model can lead to the emergence of estimators with low bias and high variance, which results in poor performance in both prediction and explanation of the model in question. The hypothesis that there are no multiple linear relationships between the explanatory variables is considered one of the leading hypotheses on which the model is based. Thus, the emergence of this problem leads to misleading results and high (Wide) confidence limits for the estimators associated with those variables due to problems characterizing the model. Shrinkage methods are considered one of the most effective and preferable ways to eliminate the multicollinearity problem. These methods are based on addressing the multicollinearity problems by reducing the variance of estimators in the model. Ridge and Lasso methods represent the most and most common of these methods of shrinkage. The simulation was carried out for different sample sizes (40, 120, 200) and some variables (P=30, 60) in the first and second experiments arbitrarily and at the level of low, medium, and high correlation coefficients (0.2, 0.5, 0.8). When (p=30, 60) Lasso method has the smallest (MSE) than the Ridge method. The Lasso method proved its efficiency by obtaining the least MSE. Optimal Penalty parameter (λ) chosen from Cross-Validation through minimizing (MSE) of prediction. We see a rapid increase for (MSE) for both (Ridge-Lasso) where the top axis indicates the number of model variables, and when the correlation between variables increases and sample size too, we can see the (MSE) values increase in the Ridge method than the Lasso method. A ridge method gives greater efficiency when the sample size is more significant than variables (p<n), but the Ridge method cannot shrink coefficients to precisely zero. So, the elasticity of ridge coefficients decreases, but variance increases bias, also (MSE) first remains relatively constant and then increases fast.","PeriodicalId":37519,"journal":{"name":"Periodicals of Engineering and Natural Sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Periodicals of Engineering and Natural Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21533/pen.v11i2.3472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

Abstract

The general linear model is widely used in many scientific fields, especially biological ones. The Ordinary Least Squares (OLS) estimators for the coefficients of the general linear model are characterized by good specifications symbolized by the acronym BLUE (Best Linear Unbiased Estimator), provided that the basic assumptions for building the model under study are met. The failure to achieve one of the basic assumptions or hypotheses required to build the model can lead to the emergence of estimators with low bias and high variance, which results in poor performance in both prediction and explanation of the model in question. The hypothesis that there are no multiple linear relationships between the explanatory variables is considered one of the leading hypotheses on which the model is based. Thus, the emergence of this problem leads to misleading results and high (Wide) confidence limits for the estimators associated with those variables due to problems characterizing the model. Shrinkage methods are considered one of the most effective and preferable ways to eliminate the multicollinearity problem. These methods are based on addressing the multicollinearity problems by reducing the variance of estimators in the model. Ridge and Lasso methods represent the most and most common of these methods of shrinkage. The simulation was carried out for different sample sizes (40, 120, 200) and some variables (P=30, 60) in the first and second experiments arbitrarily and at the level of low, medium, and high correlation coefficients (0.2, 0.5, 0.8). When (p=30, 60) Lasso method has the smallest (MSE) than the Ridge method. The Lasso method proved its efficiency by obtaining the least MSE. Optimal Penalty parameter (λ) chosen from Cross-Validation through minimizing (MSE) of prediction. We see a rapid increase for (MSE) for both (Ridge-Lasso) where the top axis indicates the number of model variables, and when the correlation between variables increases and sample size too, we can see the (MSE) values increase in the Ridge method than the Lasso method. A ridge method gives greater efficiency when the sample size is more significant than variables (p
用模拟方法比较两种收缩方法(脊-套索)
一般线性模型在许多科学领域,特别是生物学领域有着广泛的应用。只要满足建立所研究模型的基本假设,一般线性模型系数的普通最小二乘(OLS)估计量的特征是由缩写词BLUE(最佳线性无偏估计量)表示的良好规范。未能实现建立模型所需的基本假设或假设之一,可能导致出现具有低偏差和高方差的估计量,从而导致在预测和解释相关模型方面表现不佳。解释变量之间不存在多重线性关系的假设被认为是该模型所基于的主要假设之一。因此,由于模型的特征问题,该问题的出现导致了与这些变量相关的估计量的误导性结果和高(宽)置信限。收缩法被认为是消除多重共线性问题的最有效和最可取的方法之一。这些方法基于通过减少模型中估计量的方差来解决多重共线性问题。Ridge和Lasso方法代表了这些收缩方法中最常见的方法。在第一次和第二次实验中,在低、中、高相关系数(0.2、0.5、0.8)的水平上,对不同样本量(40、120、200)和一些变量(P=30、60)进行了模拟。Lasso方法通过获得最小MSE来证明其有效性。通过预测的最小化(MSE)从交叉验证中选择的最优惩罚参数(λ)。我们看到两种(Ridge Lasso)的(MSE)都快速增加,其中上轴表示模型变量的数量,并且当变量之间的相关性也随着样本量的增加而增加时,我们可以看到Ridge方法中的(MSE)值比Lasso方法增加。当样本量比变量(p<n)更重要时,岭方法提供了更高的效率,但岭方法不能将系数精确收缩为零。因此,岭系数的弹性降低,但方差增加了偏差,而且(MSE)首先保持相对恒定,然后快速增加。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.90
自引率
0.00%
发文量
140
审稿时长
7 weeks
期刊介绍: *Industrial Engineering: 1 . Ergonomics 2 . Manufacturing 3 . TQM/quality engineering, reliability/maintenance engineering 4 . Production Planning 5 . Facility location, layout, design, materials handling 6 . Education, case studies 7 . Inventory, logistics, transportation, supply chain management 8 . Management 9 . Project/operations management, scheduling 10 . Information systems for production and management 11 . Innovation, knowledge management, organizational learning *Mechanical Engineering: 1 . Energy 2 . Machine Design 3 . Engineering Materials 4 . Manufacturing 5 . Mechatronics & Robotics 6 . Transportation 7 . Fluid Mechanics 8 . Optical Engineering 9 . Nanotechnology 10 . Maintenance & Safety *Computer Science: 1 . Computational Intelligence 2 . Computer Graphics 3 . Data Mining 4 . Human-Centered Computing 5 . Internet and Web Computing 6 . Mobile and Cloud computing 7 . Software Engineering 8 . Online Social Networks *Electrical and electronics engineering 1 . Sensor, automation and instrumentation technology 2 . Telecommunications 3 . Power systems 4 . Electronics 5 . Nanotechnology *Architecture: 1 . Advanced digital applications in architecture practice and computation within Generative processes of design 2 . Computer science, biology and ecology connected with structural engineering 3 . Technology and sustainability in architecture *Bioengineering: 1 . Medical Sciences 2 . Biological and Biomedical Sciences 3 . Agriculture and Life Sciences 4 . Biology and neuroscience 5 . Biological Sciences (Botany, Forestry, Cell Biology, Marine Biology, Zoology) [...]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信