Approximate Trace Reconstruction via Median String (in Average-Case)

Diptarka Chakraborty, Debarati Das, Robert Krauthgamer
{"title":"Approximate Trace Reconstruction via Median String (in Average-Case)","authors":"Diptarka Chakraborty, Debarati Das, Robert Krauthgamer","doi":"10.4230/LIPIcs.FSTTCS.2021.11","DOIUrl":null,"url":null,"abstract":"We consider an \\emph{approximate} version of the trace reconstruction problem, where the goal is to recover an unknown string $s\\in\\{0,1\\}^n$ from $m$ traces (each trace is generated independently by passing $s$ through a probabilistic insertion-deletion channel with rate $p$). We present a deterministic near-linear time algorithm for the average-case model, where $s$ is random, that uses only \\emph{three} traces. It runs in near-linear time $\\tilde O(n)$ and with high probability reports a string within edit distance $O(\\epsilon p n)$ from $s$ for $\\epsilon=\\tilde O(p)$, which significantly improves over the straightforward bound of $O(pn)$. Technically, our algorithm computes a $(1+\\epsilon)$-approximate median of the three input traces. To prove its correctness, our probabilistic analysis shows that an approximate median is indeed close to the unknown $s$. To achieve a near-linear time bound, we have to bypass the well-known dynamic programming algorithm that computes an optimal median in time $O(n^3)$.","PeriodicalId":175000,"journal":{"name":"Foundations of Software Technology and Theoretical Computer Science","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations of Software Technology and Theoretical Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.FSTTCS.2021.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

We consider an \emph{approximate} version of the trace reconstruction problem, where the goal is to recover an unknown string $s\in\{0,1\}^n$ from $m$ traces (each trace is generated independently by passing $s$ through a probabilistic insertion-deletion channel with rate $p$). We present a deterministic near-linear time algorithm for the average-case model, where $s$ is random, that uses only \emph{three} traces. It runs in near-linear time $\tilde O(n)$ and with high probability reports a string within edit distance $O(\epsilon p n)$ from $s$ for $\epsilon=\tilde O(p)$, which significantly improves over the straightforward bound of $O(pn)$. Technically, our algorithm computes a $(1+\epsilon)$-approximate median of the three input traces. To prove its correctness, our probabilistic analysis shows that an approximate median is indeed close to the unknown $s$. To achieve a near-linear time bound, we have to bypass the well-known dynamic programming algorithm that computes an optimal median in time $O(n^3)$.
通过中值字符串近似重建轨迹(在平均情况下)
我们考虑一个\emph{近似}版本的跟踪重建问题,其目标是从$m$跟踪中恢复一个未知字符串$s\in\{0,1\}^n$(每个跟踪都是通过以$p$的速率传递$s$通过概率插入-删除通道独立生成的)。我们为平均情况模型提出了一种确定性的近线性时间算法,其中$s$是随机的,只使用\emph{三条}轨迹。它以近似线性的时间$\tilde O(n)$运行,并且很有可能报告在$\epsilon=\tilde O(p)$与$s$的编辑距离$O(\epsilon p n)$内的字符串,这大大改善了$O(pn)$的直接边界。从技术上讲,我们的算法计算三个输入轨迹的$(1+\epsilon)$ -近似中值。为了证明其正确性,我们的概率分析表明,近似中位数确实接近未知$s$。为了实现近似线性的时间范围,我们必须绕过众所周知的动态规划算法,该算法计算时间的最优中值$O(n^3)$。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信