{"title":"The Adaptive $\\tau$-Lasso: Robustness and Oracle Properties","authors":"Emadaldin Mozafari-Majd;Visa Koivunen","doi":"10.1109/TSP.2025.3563225","DOIUrl":null,"url":null,"abstract":"This paper introduces a new regularized version of the robust <inline-formula><tex-math>$\\tau$</tex-math></inline-formula>-regression estimator for analyzing high-dimensional datasets subject to gross contamination in the response variables and covariates (explanatory variables). The resulting estimator, termed adaptive <inline-formula><tex-math>$\\tau$</tex-math></inline-formula>-Lasso, is robust to outliers and high-leverage points. It also incorporates an adaptive <inline-formula><tex-math>$\\ell_{1}$</tex-math></inline-formula>-norm penalty term, which enables the selection of relevant variables and reduces the bias associated with large true regression coefficients. More specifically, this adaptive <inline-formula><tex-math>$\\ell_{1}$</tex-math></inline-formula>-norm penalty term assigns a weight to each regression coefficient. For a fixed number of predictors <inline-formula><tex-math>$ p $</tex-math></inline-formula>, we show that the adaptive <inline-formula><tex-math>$\\tau$</tex-math></inline-formula>-Lasso has the oracle property, ensuring both variable-selection consistency and asymptotic normality under fairly mild conditions. Asymptotic normality applies only to the entries of the regression vector corresponding to the true support, assuming knowledge of the true regression vector support. We characterize its robustness by establishing the finite-sample breakdown point and the influence function. We carry out extensive simulations and observe that the class of <inline-formula><tex-math>$\\tau$</tex-math></inline-formula>-Lasso estimators exhibits robustness and reliable performance in both contaminated and uncontaminated data settings. We also validate our theoretical findings on robustness properties through simulations. In the face of outliers and high-leverage points, the adaptive <inline-formula><tex-math>$\\tau$</tex-math></inline-formula>-Lasso and <inline-formula><tex-math>$\\tau$</tex-math></inline-formula>-Lasso estimators achieve the best performance or match the best performances of competing regularized estimators, with minimal or no loss in terms of prediction and variable selection accuracy for almost all scenarios considered in this study. Therefore, the adaptive <inline-formula><tex-math>$\\tau$</tex-math></inline-formula>-Lasso and <inline-formula><tex-math>$\\tau$</tex-math></inline-formula>-Lasso estimators provide attractive tools for a variety of sparse linear regression problems, particularly in high-dimensional settings and when the data is contaminated by outliers and high-leverage points. However, it is worth noting that no particular estimator uniformly dominates others in all considered scenarios.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"2464-2479"},"PeriodicalIF":4.6000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10972300/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces a new regularized version of the robust $\tau$-regression estimator for analyzing high-dimensional datasets subject to gross contamination in the response variables and covariates (explanatory variables). The resulting estimator, termed adaptive $\tau$-Lasso, is robust to outliers and high-leverage points. It also incorporates an adaptive $\ell_{1}$-norm penalty term, which enables the selection of relevant variables and reduces the bias associated with large true regression coefficients. More specifically, this adaptive $\ell_{1}$-norm penalty term assigns a weight to each regression coefficient. For a fixed number of predictors $ p $, we show that the adaptive $\tau$-Lasso has the oracle property, ensuring both variable-selection consistency and asymptotic normality under fairly mild conditions. Asymptotic normality applies only to the entries of the regression vector corresponding to the true support, assuming knowledge of the true regression vector support. We characterize its robustness by establishing the finite-sample breakdown point and the influence function. We carry out extensive simulations and observe that the class of $\tau$-Lasso estimators exhibits robustness and reliable performance in both contaminated and uncontaminated data settings. We also validate our theoretical findings on robustness properties through simulations. In the face of outliers and high-leverage points, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators achieve the best performance or match the best performances of competing regularized estimators, with minimal or no loss in terms of prediction and variable selection accuracy for almost all scenarios considered in this study. Therefore, the adaptive $\tau$-Lasso and $\tau$-Lasso estimators provide attractive tools for a variety of sparse linear regression problems, particularly in high-dimensional settings and when the data is contaminated by outliers and high-leverage points. However, it is worth noting that no particular estimator uniformly dominates others in all considered scenarios.
本文介绍了一种新的正则化版本的鲁棒$\tau$回归估计器,用于分析响应变量和协变量(解释变量)中受严重污染的高维数据集。由此产生的估计器,称为自适应$\tau$-Lasso,对异常值和高杠杆点具有鲁棒性。它还包含一个自适应的$\ell_{1}$-norm惩罚项,它可以选择相关变量并减少与大真实回归系数相关的偏差。更具体地说,这个自适应的规范惩罚项为每个回归系数分配了一个权重。对于固定数量的预测器$ p $,我们证明了自适应$\tau$-Lasso具有oracle属性,在相当温和的条件下确保变量选择一致性和渐近正态性。渐近正态性仅适用于与真实支持对应的回归向量的条目,假设真实回归向量支持的知识。通过建立有限样本击穿点和影响函数来表征其鲁棒性。我们进行了大量的模拟,并观察到一类$\tau$-Lasso估计器在污染和未污染的数据设置中都表现出鲁棒性和可靠的性能。我们还通过仿真验证了我们在鲁棒性方面的理论发现。面对异常值和高杠杆点,自适应$\tau$-Lasso和$\tau$-Lasso估计器实现了最佳性能或匹配竞争正则化估计器的最佳性能,在本研究中考虑的几乎所有场景中,在预测和变量选择精度方面损失最小或没有损失。因此,自适应$\tau$-Lasso和$\tau$-Lasso估计器为各种稀疏线性回归问题提供了有吸引力的工具,特别是在高维设置中以及当数据被异常值和高杠杆点污染时。然而,值得注意的是,在所有考虑的场景中,没有一个特定的估计器统一地支配其他估计器。
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.