{"title":"On the Asymptotic Rate of Optimal Codes That Correct Tandem Duplications for Nanopore Sequencing","authors":"Wenjun Yu;Zuo Ye;Moshe Schwartz","doi":"10.1109/TIT.2025.3544875","DOIUrl":null,"url":null,"abstract":"We study codes that can correct backtracking errors during nanopore sequencing. In this channel, a sequence of length <italic>n</i> over an alphabet of size <italic>q</i> is being read by a sliding window of length <inline-formula> <tex-math>$\\ell $ </tex-math></inline-formula>, where from each window we obtain only its composition. Backtracking errors cause some windows to repeat, hence manifesting as tandem-duplication errors of fixed length <italic>k</i> in the <inline-formula> <tex-math>$\\ell $ </tex-math></inline-formula>-read vector of window compositions. While existing constructions for duplication-correcting codes can be straightforwardly adapted to this model, even resulting in optimal codes, their asymptotic rate is hard to find. In the regime of unbounded number of duplication errors, we either give the exact asymptotic rate of optimal codes, or bounds on it, depending on the values of <italic>k</i>, <inline-formula> <tex-math>$\\ell $ </tex-math></inline-formula> and <italic>q</i>. In the regime of a constant number of duplication errors, <italic>t</i>, we find the redundancy of optimal codes to be <inline-formula> <tex-math>$t\\log _{q} n+O(1)$ </tex-math></inline-formula> when <inline-formula> <tex-math>$\\ell |k$ </tex-math></inline-formula>, and only upper bounded by this quantity otherwise.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 5","pages":"3569-3581"},"PeriodicalIF":2.2000,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10906628/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
We study codes that can correct backtracking errors during nanopore sequencing. In this channel, a sequence of length n over an alphabet of size q is being read by a sliding window of length $\ell $ , where from each window we obtain only its composition. Backtracking errors cause some windows to repeat, hence manifesting as tandem-duplication errors of fixed length k in the $\ell $ -read vector of window compositions. While existing constructions for duplication-correcting codes can be straightforwardly adapted to this model, even resulting in optimal codes, their asymptotic rate is hard to find. In the regime of unbounded number of duplication errors, we either give the exact asymptotic rate of optimal codes, or bounds on it, depending on the values of k, $\ell $ and q. In the regime of a constant number of duplication errors, t, we find the redundancy of optimal codes to be $t\log _{q} n+O(1)$ when $\ell |k$ , and only upper bounded by this quantity otherwise.
期刊介绍:
The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.