A New Version of q-Ary Varshamov-Tenengolts Codes With More Efficient Encoders: The Differential VT Codes and The Differential Shifted VT Codes

IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Tuan Thanh Nguyen;Kui Cai;Paul H. Siegel
{"title":"A New Version of q-Ary Varshamov-Tenengolts Codes With More Efficient Encoders: The Differential VT Codes and The Differential Shifted VT Codes","authors":"Tuan Thanh Nguyen;Kui Cai;Paul H. Siegel","doi":"10.1109/TIT.2024.3417894","DOIUrl":null,"url":null,"abstract":"The problem of correcting deletions and insertions has recently received significantly increased attention due to the DNA-based data storage technology, which suffers from deletions and insertions with extremely high probability. In this work, we study the problem of constructing non-binary burst-deletion/insertion correcting codes. Particularly, for the quaternary alphabet, our designed codes are suited for correcting a burst of deletions/insertions in DNA storage. Non-binary codes correcting a single deletion or insertion were introduced by Tenengolts (1984), and the results were extended to correct a fixed-length burst of deletions or insertions by Schoeny et al. (2017). Recently, Wang et al. (2021) proposed constructions of non-binary codes of length n, correcting a burst of length at most two for q-ary alphabets with redundancy \n<inline-formula> <tex-math>$\\log n+O(\\log q \\log \\log n)$ </tex-math></inline-formula>\n bits, for arbitrary even q. The common idea in those constructions is to convert non-binary sequences into binary sequences, and the error decoding algorithms for the q-ary sequences are mainly based on the success of recovering the corresponding binary sequences, respectively. In this work, we look at a natural solution that the error detection and correction algorithms are performed directly over q-ary sequences, and for certain cases, our codes provide a more efficient encoder with lower redundancy than the best-known encoder in the literature. Particularly, (Single-error correction codes) We first present a new version of non-binary VT codes that are capable of correcting a single deletion or single insertion, providing an alternative simpler and more efficient encoder of the construction by Tenengolts (1984). Our construction is based on the differential vector, and the codes are referred to as the differential VT codes. In addition, we provide linear-time algorithms that encode user messages into these codes of length n over the q-ary alphabet for \n<inline-formula> <tex-math>$q \\geqslant 2$ </tex-math></inline-formula>\n with at most \n<inline-formula> <tex-math>$\\lceil \\log _{q} n\\rceil +1$ </tex-math></inline-formula>\n redundant symbols, while the optimal redundancy required is at least \n<inline-formula> <tex-math>$\\log _{q} n+\\log _{q} (q-1)$ </tex-math></inline-formula>\n symbols. Our designed encoder reduces the redundancy of the best-known encoder of Tenengolts (1984) by at least 2 redundant symbols or equivalently \n<inline-formula> <tex-math>$2\\log _{2} q$ </tex-math></inline-formula>\n bits. (Burst-error correction codes) We use the idea of the binary shifted VT codes to define the q-ary differential shifted VT codes, and propose non-binary codes correcting a burst of up to two deletions (or two insertions) with redundancy \n<inline-formula> <tex-math>$\\log n+3\\log \\log n+ O(\\log q)$ </tex-math></inline-formula>\n bits, which improves a recent result of Wang et al. (2021) with redundancy \n<inline-formula> <tex-math>$\\log n+O(\\log q \\log \\log n)$ </tex-math></inline-formula>\n bits for all \n<inline-formula> <tex-math>$q\\geqslant 8$ </tex-math></inline-formula>\n. We then extend the construction to design non-binary codes correcting a burst of either exactly or at most t deletions (or insertions) for arbitrary \n<inline-formula> <tex-math>$t\\geqslant 2$ </tex-math></inline-formula>\n.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"70 10","pages":"6989-7004"},"PeriodicalIF":2.2000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10571999/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The problem of correcting deletions and insertions has recently received significantly increased attention due to the DNA-based data storage technology, which suffers from deletions and insertions with extremely high probability. In this work, we study the problem of constructing non-binary burst-deletion/insertion correcting codes. Particularly, for the quaternary alphabet, our designed codes are suited for correcting a burst of deletions/insertions in DNA storage. Non-binary codes correcting a single deletion or insertion were introduced by Tenengolts (1984), and the results were extended to correct a fixed-length burst of deletions or insertions by Schoeny et al. (2017). Recently, Wang et al. (2021) proposed constructions of non-binary codes of length n, correcting a burst of length at most two for q-ary alphabets with redundancy $\log n+O(\log q \log \log n)$ bits, for arbitrary even q. The common idea in those constructions is to convert non-binary sequences into binary sequences, and the error decoding algorithms for the q-ary sequences are mainly based on the success of recovering the corresponding binary sequences, respectively. In this work, we look at a natural solution that the error detection and correction algorithms are performed directly over q-ary sequences, and for certain cases, our codes provide a more efficient encoder with lower redundancy than the best-known encoder in the literature. Particularly, (Single-error correction codes) We first present a new version of non-binary VT codes that are capable of correcting a single deletion or single insertion, providing an alternative simpler and more efficient encoder of the construction by Tenengolts (1984). Our construction is based on the differential vector, and the codes are referred to as the differential VT codes. In addition, we provide linear-time algorithms that encode user messages into these codes of length n over the q-ary alphabet for $q \geqslant 2$ with at most $\lceil \log _{q} n\rceil +1$ redundant symbols, while the optimal redundancy required is at least $\log _{q} n+\log _{q} (q-1)$ symbols. Our designed encoder reduces the redundancy of the best-known encoder of Tenengolts (1984) by at least 2 redundant symbols or equivalently $2\log _{2} q$ bits. (Burst-error correction codes) We use the idea of the binary shifted VT codes to define the q-ary differential shifted VT codes, and propose non-binary codes correcting a burst of up to two deletions (or two insertions) with redundancy $\log n+3\log \log n+ O(\log q)$ bits, which improves a recent result of Wang et al. (2021) with redundancy $\log n+O(\log q \log \log n)$ bits for all $q\geqslant 8$ . We then extend the construction to design non-binary codes correcting a burst of either exactly or at most t deletions (or insertions) for arbitrary $t\geqslant 2$ .
具有更高效编码器的新版 q-ary Varshamov-Tenengolts 码:差分 VT 码和差分移位 VT 码
由于基于 DNA 的数据存储技术以极高的概率发生删除和插入,删除和插入校正问题近来受到越来越多的关注。在这项工作中,我们研究了构建非二进制猝发删除/插入校正码的问题。特别是对于四元字母表,我们设计的编码适用于纠正 DNA 存储中的猝发删除/插入。Tenengolts(1984 年)提出了纠正单个删除或插入的非二进制编码,Schoeny 等人(2017 年)将其结果扩展到纠正固定长度的删除或插入突发。最近,Wang 等人(2021 年)提出了长度为 n 的非二进制编码的构造,在任意偶数 q 的情况下,对冗余度为 $\log n+O(\log q \log \log n)$ 位的 q-ary 字母表最多校正两个长度的突发。这些构造的共同思想是将非二进制序列转换为二进制序列,而 q-ary 序列的错误解码算法主要分别基于恢复相应二进制序列的成功率。在这项工作中,我们研究了一种自然的解决方案,即直接在 q-ary 序列上执行错误检测和纠正算法,在某些情况下,我们的编码提供了比文献中最著名的编码器更高效、冗余度更低的编码器。特别是(单次纠错码),我们首先提出了一种新版本的非二进制 VT 码,它能够纠正单次删除或单次插入,为 Tenengolts(1984)的构造提供了另一种更简单、更高效的编码器。我们的构造基于差分向量,这些编码被称为差分 VT 编码。此外,我们还提供了线性时间算法,可将用户信息编码成这些长度为 n 的编码,编码长度为 $q \geqslant 2$ 的 q-ary 字母表,冗余符号最多为 $\lceil \log _{q} n\rceil +1$ ,而最佳冗余度要求至少为 $\log _{q} n+\log _{q} (q-1)$ 符号。我们设计的编码器至少减少了 Tenengolts(1984 年)最著名编码器的 2 个冗余符号,或相当于 2 log _{2} q$ 比特的冗余。(突发纠错码)我们利用二进制移位 VT 码的思想定义了 qary 差分移位 VT 码,并提出了非二进制码,以 $\log n+3\log \log n+ O(\log q)$ 位的冗余纠正最多两次删除(或两次插入)的突发,这改进了 Wang 等人(2021)最近以 $\log n+3\log \log n+ O(\log q)$ 位的冗余纠正突发的结果。(2021) 的结果,在所有 $q\geqslant 8$ 的情况下,冗余度为 $\log n+O(\log q \log \log n)$ 位。然后,我们将这一结构扩展到设计非二进制编码,以校正任意 $t\geqslant 2$ 的完全或最多 t 次删除(或插入)的突发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory 工程技术-工程:电子与电气
CiteScore
5.70
自引率
20.00%
发文量
514
审稿时长
12 months
期刊介绍: The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信