Approximate Trace Reconstruction via Median String (in Average-Case)

Foundations of Software Technology and Theoretical Computer Science Pub Date : 2021-07-20 DOI:10.4230/LIPIcs.FSTTCS.2021.11

Diptarka Chakraborty, Debarati Das, Robert Krauthgamer

引用次数: 4

Abstract

We consider an \emph{approximate} version of the trace reconstruction problem, where the goal is to recover an unknown string $s\in\{0,1\}^n$ from $m$ traces (each trace is generated independently by passing $s$ through a probabilistic insertion-deletion channel with rate $p$). We present a deterministic near-linear time algorithm for the average-case model, where $s$ is random, that uses only \emph{three} traces. It runs in near-linear time $\tilde O(n)$ and with high probability reports a string within edit distance $O(\epsilon p n)$ from $s$ for $\epsilon=\tilde O(p)$, which significantly improves over the straightforward bound of $O(pn)$. Technically, our algorithm computes a $(1+\epsilon)$-approximate median of the three input traces. To prove its correctness, our probabilistic analysis shows that an approximate median is indeed close to the unknown $s$. To achieve a near-linear time bound, we have to bypass the well-known dynamic programming algorithm that computes an optimal median in time $O(n^3)$.

查看原文本刊更多论文

通过中值字符串近似重建轨迹(在平均情况下)

我们考虑一个\emph{近似}版本的跟踪重建问题，其目标是从$m$跟踪中恢复一个未知字符串$s\in\{0,1\}^n$(每个跟踪都是通过以$p$的速率传递$s$通过概率插入-删除通道独立生成的)。我们为平均情况模型提出了一种确定性的近线性时间算法，其中$s$是随机的，只使用\emph{三条}轨迹。它以近似线性的时间$\tilde O(n)$运行，并且很有可能报告在$\epsilon=\tilde O(p)$与$s$的编辑距离$O(\epsilon p n)$内的字符串，这大大改善了$O(pn)$的直接边界。从技术上讲，我们的算法计算三个输入轨迹的$(1+\epsilon)$ -近似中值。为了证明其正确性，我们的概率分析表明，近似中位数确实接近未知$s$。为了实现近似线性的时间范围，我们必须绕过众所周知的动态规划算法，该算法计算时间的最优中值$O(n^3)$。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Foundations of Software Technology and Theoretical Computer Science

自引率

0.00%

发文量