Optimal mean-based algorithms for trace reconstruction

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing Pub Date : 2016-12-09 DOI:10.1145/3055399.3055450

Anindya De, R. O'Donnell, R. Servedio

{"title":"Optimal mean-based algorithms for trace reconstruction","authors":"Anindya De, R. O'Donnell, R. Servedio","doi":"10.1145/3055399.3055450","DOIUrl":null,"url":null,"abstract":"In the (deletion-channel) trace reconstruction problem, there is an unknown n-bit source string x. An algorithm is given access to independent traces of x, where a trace is formed by deleting each bit of x independently with probability δ. The goal of the algorithm is to recover x exactly (with high probability), while minimizing samples (number of traces) and running time. Previously, the best known algorithm for the trace reconstruction problem was due to Holenstein et al. [SODA 2008]; it uses exp(O(n1/2)) samples and running time for any fixed 0 < δ < 1. It is also what we call a \"mean-based algorithm\", meaning that it only uses the empirical means of the individual bits of the traces. Holenstein et al. also gave a lower bound, showing that any mean-based algorithm must use at least nΩ(logn) samples. In this paper we improve both of these results, obtaining matching upper and lower bounds for mean-based trace reconstruction. For any constant deletion rate 0 < Ω < 1, we give a mean-based algorithm that uses exp(O(n1/3)) time and traces; we also prove that any mean-based algorithm must use at least exp(Ω(n1/3)) traces. In fact, we obtain matching upper and lower bounds even for Ω subconstant and ρ := 1 - Ω subconstant: when (log3 n)/n ≪ Ω ≤ 1/2 the bound is exp(-Θ(δδ n)1/3), and when 1/√n ≪ ρ ≥ 1/2 the bound is exp(-Θ(n/Θ)1/3). Our proofs involve estimates for the maxima of Littlewood polynomials on complex disks. We show that these techniques can also be used to perform trace reconstruction with random insertions and bit-flips in addition to deletions. We also find a surprising result: for deletion probabilities δ > 1/2, the presence of insertions can actually help with trace reconstruction.","PeriodicalId":20615,"journal":{"name":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3055399.3055450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

Abstract

In the (deletion-channel) trace reconstruction problem, there is an unknown n-bit source string x. An algorithm is given access to independent traces of x, where a trace is formed by deleting each bit of x independently with probability δ. The goal of the algorithm is to recover x exactly (with high probability), while minimizing samples (number of traces) and running time. Previously, the best known algorithm for the trace reconstruction problem was due to Holenstein et al. [SODA 2008]; it uses exp(O(n1/2)) samples and running time for any fixed 0 < δ < 1. It is also what we call a "mean-based algorithm", meaning that it only uses the empirical means of the individual bits of the traces. Holenstein et al. also gave a lower bound, showing that any mean-based algorithm must use at least nΩ(logn) samples. In this paper we improve both of these results, obtaining matching upper and lower bounds for mean-based trace reconstruction. For any constant deletion rate 0 < Ω < 1, we give a mean-based algorithm that uses exp(O(n1/3)) time and traces; we also prove that any mean-based algorithm must use at least exp(Ω(n1/3)) traces. In fact, we obtain matching upper and lower bounds even for Ω subconstant and ρ := 1 - Ω subconstant: when (log3 n)/n ≪ Ω ≤ 1/2 the bound is exp(-Θ(δδ n)1/3), and when 1/√n ≪ ρ ≥ 1/2 the bound is exp(-Θ(n/Θ)1/3). Our proofs involve estimates for the maxima of Littlewood polynomials on complex disks. We show that these techniques can also be used to perform trace reconstruction with random insertions and bit-flips in addition to deletions. We also find a surprising result: for deletion probabilities δ > 1/2, the presence of insertions can actually help with trace reconstruction.

查看原文本刊更多论文

基于均值的最优轨迹重建算法

在(删除通道)迹重建问题中，存在一个未知的n位源字符串x。给出了一种算法来访问x的独立迹，其中通过以概率δ独立地删除x的每个位来形成迹。该算法的目标是精确地(高概率地)恢复x，同时最小化样本(跟踪数)和运行时间。此前，最著名的轨迹重建算法是Holenstein等人提出的[SODA 2008];对于任意固定的0 < δ < 1，它使用exp(O(n1/2))样本和运行时间。这也是我们所说的“基于均值的算法”，意思是它只使用轨迹中单个比特的经验均值。Holenstein等人也给出了一个下界，表明任何基于均值的算法必须使用至少nΩ(logn)个样本。本文改进了这两个结果，得到了基于均值的轨迹重建的匹配上界和下界。对于任意恒定的删除率0 < Ω < 1，我们给出了一个基于均值的算法，该算法使用exp(O(n1/3))时间和轨迹;我们还证明了任何基于均值的算法必须至少使用exp(Ω(n1/3))条轨迹。事实上，即使对于Ω亚常数和ρ:= 1 - Ω亚常数，我们也能得到匹配的上界和下界:当(log3n)/n≪Ω≤1/2时，界为exp(-Θ(δδ n)1/3)，当1/√n≪ρ≥1/2时，界为exp(-Θ(n/Θ)1/3)。我们的证明涉及对复盘上利特伍德多项式的最大值的估计。我们表明，这些技术也可以用于执行随机插入和位翻转的跟踪重建，除了删除。我们还发现了一个令人惊讶的结果:对于缺失概率δ > 1/2，插入的存在实际上有助于痕迹重建。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing

自引率

0.00%

发文量