Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning

Sixth International Conference on Data Mining (ICDM'06) Pub Date : 2006-12-18 DOI:10.1109/ICDM.2006.134

Li Wang, Michael D. Gordon, Ji Zhu

{"title":"Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning","authors":"Li Wang, Michael D. Gordon, Ji Zhu","doi":"10.1109/ICDM.2006.134","DOIUrl":null,"url":null,"abstract":"Linear regression is one of the most important and widely used techniques for data analysis. However, sometimes people are not satisfied with it because of the following two limitations: 1) its results are sensitive to outliers, so when the error terms are not normally distributed, especially when they have heavy-tailed distributions, linear regression often works badly; 2) its estimated coefficients tend to have high variance, although their bias is low. To reduce the influence of outliers, robust regression models were developed. Least absolute deviation (LAD) regression is one of them. LAD minimizes the mean absolute errors, instead of mean squared errors, so its results are more robust. To address the second limitation, shrinkage methods were proposed, which add a penalty on the size of the coefficients. The LASSO is one of these methods and it uses the L1-norm penalty, which not only reduces the prediction error and the variance of estimated coefficients, but also provides an automatic feature selection function. In this paper, we propose the regularized least absolute deviation (RLAD) regression model, which combines the nice features of the LAD and the LASSO together. The RLAD is a regularization method, whose objective function has the form of \"loss + penalty.\" The \"loss\" is the sum of the absolute deviations and the \"penalty\" is the L1-norm of the coefficient vector. Furthermore, to facilitate parameter tuning, we develop an efficient algorithm which can solve the entire regularization path in one pass. Simulations with various settings are performed to demonstrate its performance. Finally, we apply the algorithm to solve the image reconstruction problem and find interesting results.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"95","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Data Mining (ICDM'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2006.134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 95

Abstract

Linear regression is one of the most important and widely used techniques for data analysis. However, sometimes people are not satisfied with it because of the following two limitations: 1) its results are sensitive to outliers, so when the error terms are not normally distributed, especially when they have heavy-tailed distributions, linear regression often works badly; 2) its estimated coefficients tend to have high variance, although their bias is low. To reduce the influence of outliers, robust regression models were developed. Least absolute deviation (LAD) regression is one of them. LAD minimizes the mean absolute errors, instead of mean squared errors, so its results are more robust. To address the second limitation, shrinkage methods were proposed, which add a penalty on the size of the coefficients. The LASSO is one of these methods and it uses the L1-norm penalty, which not only reduces the prediction error and the variance of estimated coefficients, but also provides an automatic feature selection function. In this paper, we propose the regularized least absolute deviation (RLAD) regression model, which combines the nice features of the LAD and the LASSO together. The RLAD is a regularization method, whose objective function has the form of "loss + penalty." The "loss" is the sum of the absolute deviations and the "penalty" is the L1-norm of the coefficient vector. Furthermore, to facilitate parameter tuning, we develop an efficient algorithm which can solve the entire regularization path in one pass. Simulations with various settings are performed to demonstrate its performance. Finally, we apply the algorithm to solve the image reconstruction problem and find interesting results.

查看原文本刊更多论文

正则化最小绝对偏差回归及参数整定的有效算法

线性回归是数据分析中最重要和应用最广泛的技术之一。然而，有时人们对它并不满意，因为它有以下两个局限性:1)它的结果对异常值很敏感，所以当误差项不是正态分布时，特别是当它们具有重尾分布时，线性回归往往效果不佳;2)其估计系数往往具有高方差，尽管它们的偏差很低。为了减少异常值的影响，建立了稳健的回归模型。最小绝对偏差(LAD)回归就是其中之一。LAD最小化的是平均绝对误差，而不是均方误差，因此它的结果更稳健。为了解决第二个限制，提出了收缩方法，这增加了对系数大小的惩罚。LASSO就是其中的一种方法，它使用l1范数惩罚，不仅减少了预测误差和估计系数的方差，而且提供了一个自动的特征选择功能。本文提出了正则化最小绝对偏差(RLAD)回归模型，该模型结合了正则化最小绝对偏差和LASSO的优点。RLAD是一种正则化方法，其目标函数具有“损失+惩罚”的形式。“损失”是绝对偏差的总和，“惩罚”是系数向量的l1范数。此外，为了方便参数调整，我们开发了一种有效的算法，可以一次求解整个正则化路径。通过不同设置的仿真来验证其性能。最后，我们将该算法应用于图像重建问题，得到了一些有趣的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Sixth International Conference on Data Mining (ICDM'06)

自引率

0.00%

发文量