Low rank approximation with entrywise l1-norm error

Zhao Song, David P. Woodruff, Peilin Zhong
{"title":"Low rank approximation with entrywise l1-norm error","authors":"Zhao Song, David P. Woodruff, Peilin Zhong","doi":"10.1145/3055399.3055431","DOIUrl":null,"url":null,"abstract":"We study the ℓ1-low rank approximation problem, where for a given n x d matrix A and approximation factor α ≤ 1, the goal is to output a rank-k matrix  for which ‖A-Â‖1 ≤ α · min rank-k matrices A′ ‖A-A′‖1, where for an n x d matrix C, we let ‖C‖1 = ∑i=1n ∑j=1d |Ci,j|. This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms. We give the first provable approximation algorithms for ℓ1-low rank approximation, showing that it is possible to achieve approximation factor α = (logd) #183; poly(k) in nnz(A) + (n+d) poly(k) time, where nnz(A) denotes the number of non-zero entries of A. If k is constant, we further improve the approximation ratio to O(1) with a poly(nd)-time algorithm. Under the Exponential Time Hypothesis, we show there is no poly(nd)-time algorithm achieving a (1+1/log1+γ(nd))-approximation, for γ > 0 an arbitrarily small constant, even when k = 1. We give a number of additional results for ℓ1-low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to ℓp-norms for 1 ≤ p < 2 and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation.","PeriodicalId":20615,"journal":{"name":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"96","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055399.3055431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 96

Abstract

We study the ℓ1-low rank approximation problem, where for a given n x d matrix A and approximation factor α ≤ 1, the goal is to output a rank-k matrix  for which ‖A-Â‖1 ≤ α · min rank-k matrices A′ ‖A-A′‖1, where for an n x d matrix C, we let ‖C‖1 = ∑i=1n ∑j=1d |Ci,j|. This error measure is known to be more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. The problem was shown to be NP-hard by Gillis and Vavasis and a number of heuristics have been proposed. It was asked in multiple places if there are any approximation algorithms. We give the first provable approximation algorithms for ℓ1-low rank approximation, showing that it is possible to achieve approximation factor α = (logd) #183; poly(k) in nnz(A) + (n+d) poly(k) time, where nnz(A) denotes the number of non-zero entries of A. If k is constant, we further improve the approximation ratio to O(1) with a poly(nd)-time algorithm. Under the Exponential Time Hypothesis, we show there is no poly(nd)-time algorithm achieving a (1+1/log1+γ(nd))-approximation, for γ > 0 an arbitrarily small constant, even when k = 1. We give a number of additional results for ℓ1-low rank approximation: nearly tight upper and lower bounds for column subset selection, CUR decompositions, extensions to low rank approximation with respect to ℓp-norms for 1 ≤ p < 2 and earthmover distance, low-communication distributed protocols and low-memory streaming algorithms, algorithms with limited randomness, and bicriteria algorithms. We also give a preliminary empirical evaluation.
低秩近似与入口方向11范数误差
我们研究了1-低秩近似问题,其中对于给定的n x d矩阵a和近似因子α≤1,目标是输出一个秩-k矩阵Â,其中‖a -Â‖1≤α·最小秩-k矩阵a′‖a - a′‖1,其中对于n x d矩阵C,我们令‖C‖1 =∑i=1n∑j=1d |Ci,j|。在存在异常值的情况下,这种误差测量已知比Frobenius范数更稳健,并且在对噪声的高斯假设可能不适用的模型中表示。Gillis和Vavasis证明了这个问题是np困难的,并提出了许多启发式方法。很多地方都问过是否有近似算法。我们给出了第一个可证明的l_1 -低秩近似的逼近算法,表明可以实现近似因子α = (logd) #183;poly(k) in nnz(A) + (n+d) poly(k) time,其中nnz(A)表示A的非零条目数。如果k为常数,我们进一步使用poly(nd) time算法将近似比提高到O(1)。在指数时间假设下,我们证明没有多(nd)时间算法实现(1+1/log1+γ(nd))-近似,即使当k = 1时,γ > 0是一个任意小的常数。我们给出了一些额外的结果:列子集选择的近紧上界和下界,CUR分解,关于1≤p < 2和土方距离的低秩近似的扩展,低通信分布式协议和低内存流算法,有限随机性算法和双准则算法。并给出了初步的实证评价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信