Low rank approximation with entrywise l1-norm error

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing Pub Date : 2017-06-19 DOI:10.1145/3055399.3055431

Zhao Song, David P. Woodruff, Peilin Zhong

{"title":"Low rank approximation with entrywise l1-norm error","authors":"Zhao Song, David P. Woodruff, Peilin Zhong","doi":"10.1145/3055399.3055431","DOIUrl":null,"url":null,"abstract":"We study the ℓ1-low rank approximation problem, where for a given n x d matrix A and approximation factor α ≤ 1, the goal is to output a rank-k matrix Â for which ‖A-Â‖1 ≤ α · min rank-k matrices A′ ‖A-A′‖1, where for an n x d matrix C, we let ‖C‖1 = ∑i=1n ∑j=1d |Ci,j|. This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms. We give the first provable approximation algorithms for ℓ1-low rank approximation, showing that it is possible to achieve approximation factor α = (logd) #183; poly(k) in nnz(A) + (n+d) poly(k) time, where nnz(A) denotes the number of non-zero entries of A. If k is constant, we further improve the approximation ratio to O(1) with a poly(nd)-time algorithm. Under the Exponential Time Hypothesis, we show there is no poly(nd)-time algorithm achieving a (1+1/log1+γ(nd))-approximation, for γ > 0 an arbitrarily small constant, even when k = 1. We give a number of additional results for ℓ1-low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to ℓp-norms for 1 ≤ p < 2 and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation.","PeriodicalId":20615,"journal":{"name":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"96","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055399.3055431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 96

Abstract

We study the ℓ1-low rank approximation problem, where for a given n x d matrix A and approximation factor α ≤ 1, the goal is to output a rank-k matrix Â for which ‖A-Â‖1 ≤ α · min rank-k matrices A′ ‖A-A′‖1, where for an n x d matrix C, we let ‖C‖1 = ∑i=1n ∑j=1d |Ci,j|. This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms. We give the first provable approximation algorithms for ℓ1-low rank approximation, showing that it is possible to achieve approximation factor α = (logd) #183; poly(k) in nnz(A) + (n+d) poly(k) time, where nnz(A) denotes the number of non-zero entries of A. If k is constant, we further improve the approximation ratio to O(1) with a poly(nd)-time algorithm. Under the Exponential Time Hypothesis, we show there is no poly(nd)-time algorithm achieving a (1+1/log1+γ(nd))-approximation, for γ > 0 an arbitrarily small constant, even when k = 1. We give a number of additional results for ℓ1-low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to ℓp-norms for 1 ≤ p < 2 and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation.

查看原文本刊更多论文

低秩近似与入口方向11范数误差

我们研究了1-低秩近似问题，其中对于给定的n x d矩阵a和近似因子α≤1，目标是输出一个秩-k矩阵Â，其中‖a -Â‖1≤α·最小秩-k矩阵a′‖a - a′‖1，其中对于n x d矩阵C，我们令‖C‖1 =∑i=1n∑j=1d |Ci,j|。在存在异常值的情况下，这种误差测量已知比Frobenius范数更稳健，并且在对噪声的高斯假设可能不适用的模型中表示。Gillis和Vavasis证明了这个问题是np困难的，并提出了许多启发式方法。很多地方都问过是否有近似算法。我们给出了第一个可证明的l_1 -低秩近似的逼近算法，表明可以实现近似因子α = (logd) #183;poly(k) in nnz(A) + (n+d) poly(k) time，其中nnz(A)表示A的非零条目数。如果k为常数，我们进一步使用poly(nd) time算法将近似比提高到O(1)。在指数时间假设下，我们证明没有多(nd)时间算法实现(1+1/log1+γ(nd))-近似，即使当k = 1时，γ > 0是一个任意小的常数。我们给出了一些额外的结果:列子集选择的近紧上界和下界，CUR分解，关于1≤p < 2和土方距离的低秩近似的扩展，低通信分布式协议和低内存流算法，有限随机性算法和双准则算法。并给出了初步的实证评价。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

自引率

0.00%

发文量