Shuffled Linear Regression with Erroneous Observations

2019 53rd Annual Conference on Information Sciences and Systems (CISS) Pub Date : 2019-03-01 DOI:10.1109/CISS.2019.8692838

S. Saab, Khaled Kamal Saab, S. Saab

{"title":"Shuffled Linear Regression with Erroneous Observations","authors":"S. Saab, Khaled Kamal Saab, S. Saab","doi":"10.1109/CISS.2019.8692838","DOIUrl":null,"url":null,"abstract":"Linear regression with shuffled labels is the problem of performing a linear regression fit on datasets whose labels are unknowingly shuffled with respect to their inputs. Such a problem relates to different applications such as genome sequence assembly, sampling and reconstruction of spatial fields, and communication networks. Existing methods are either applicable only to data with limited observation errors, work only for partially shuffled data, sensitive to initialization, and/or work only with small dimensions. This paper tackles this problem in its full generality using stochastic approximation, which is based on a first-order permutation-invariant constraint. We propose an optimal recursive algorithm that updates the estimate from the underdetermined function that is based on that permutation-invariant constraint. The proposed algorithm aims for per-iteration minimization of the mean square estimate error. Although our algorithm is sensitive to initialization errors, to the best of our knowledge, the resulting method is the first working solution for arbitrary large dimensions and arbitrary large observation errors while its computation throughput appears insignificant. Numerical simulations show that our method with shuffled datasets can outperform the ordinary least squares method without shuffling. We also consider a batch process to this problem where the datasets are independently available. The solution we propose is independent of initialization but requires that number of such datasets to be at least equal to the dimension of the unknown vector.","PeriodicalId":123696,"journal":{"name":"2019 53rd Annual Conference on Information Sciences and Systems (CISS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 53rd Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS.2019.8692838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Linear regression with shuffled labels is the problem of performing a linear regression fit on datasets whose labels are unknowingly shuffled with respect to their inputs. Such a problem relates to different applications such as genome sequence assembly, sampling and reconstruction of spatial fields, and communication networks. Existing methods are either applicable only to data with limited observation errors, work only for partially shuffled data, sensitive to initialization, and/or work only with small dimensions. This paper tackles this problem in its full generality using stochastic approximation, which is based on a first-order permutation-invariant constraint. We propose an optimal recursive algorithm that updates the estimate from the underdetermined function that is based on that permutation-invariant constraint. The proposed algorithm aims for per-iteration minimization of the mean square estimate error. Although our algorithm is sensitive to initialization errors, to the best of our knowledge, the resulting method is the first working solution for arbitrary large dimensions and arbitrary large observation errors while its computation throughput appears insignificant. Numerical simulations show that our method with shuffled datasets can outperform the ordinary least squares method without shuffling. We also consider a batch process to this problem where the datasets are independently available. The solution we propose is independent of initialization but requires that number of such datasets to be at least equal to the dimension of the unknown vector.

查看原文本刊更多论文

有错误观测的线性回归

带洗牌标签的线性回归是对数据集执行线性回归拟合的问题，这些数据集的标签相对于它们的输入被不知不觉地洗牌。这一问题涉及到基因组序列组装、空间场采样和重建以及通信网络等不同的应用。现有的方法要么只适用于观测误差有限的数据，要么只适用于部分打乱的数据，对初始化敏感，要么只适用于小维度的数据。本文利用基于一阶置换不变约束的随机逼近，全面地解决了这一问题。我们提出了一种最优递归算法，该算法从基于该排列不变约束的欠定函数更新估计。该算法旨在使均方估计误差在每次迭代中最小化。虽然我们的算法对初始化误差很敏感，但据我们所知，所得到的方法是任意大尺寸和任意大观测误差的第一个工作解，而其计算吞吐量显得微不足道。数值仿真结果表明，该方法对数据集进行了洗牌处理，优于不进行洗牌处理的普通最小二乘法。我们还考虑了这个问题的批处理过程，其中数据集是独立可用的。我们提出的解决方案与初始化无关，但要求此类数据集的数量至少等于未知向量的维数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 53rd Annual Conference on Information Sciences and Systems (CISS)

自引率

0.00%

发文量