A New Algebraic Approach for String Reconstruction From Substring Compositions

IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Utkarsh Gupta;Hessam Mahdavifar
{"title":"A New Algebraic Approach for String Reconstruction From Substring Compositions","authors":"Utkarsh Gupta;Hessam Mahdavifar","doi":"10.1109/TIT.2024.3493762","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new algorithm for the problem of string reconstruction from its substring composition multiset. Motivated by applications in polymer-based data storage for recovering strings from tandem mass-spectrometry sequencing, the proposed algorithm leverages the equivalent polynomial formulation of the problem which facilitates efficient parallel implementation. The computational complexity of the proposed reconstruction algorithm is upper bounded by \n<inline-formula> <tex-math>$6.5n^{2}$ </tex-math></inline-formula>\n finite field operations, where the field size is upper bounded by \n<inline-formula> <tex-math>$10n$ </tex-math></inline-formula>\n, implying that the computational complexity is upper bounded by \n<inline-formula> <tex-math>$6.5n^{2}(3.22+\\log {n})$ </tex-math></inline-formula>\n binary operations. Furthermore, it allows parallelization leading to \n<inline-formula> <tex-math>$O(n \\log n)$ </tex-math></inline-formula>\n reconstruction latency. We characterize sufficient conditions for a length n binary string that guarantee the string’s reconstruction time complexity to be bounded polynomially. Moreover, the sufficient conditions on binary strings that guarantee reconstruction in polynomial time are more general than the conditions for the algorithm by Acharya et al. This is used to construct new codebooks of reconstruction codes that have efficient encoding procedures, and are larger, by at least a linear factor in size, compared to the previously best known construction by Pattabiraman et al., (2023).","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 1","pages":"125-137"},"PeriodicalIF":2.2000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10754998/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, we propose a new algorithm for the problem of string reconstruction from its substring composition multiset. Motivated by applications in polymer-based data storage for recovering strings from tandem mass-spectrometry sequencing, the proposed algorithm leverages the equivalent polynomial formulation of the problem which facilitates efficient parallel implementation. The computational complexity of the proposed reconstruction algorithm is upper bounded by $6.5n^{2}$ finite field operations, where the field size is upper bounded by $10n$ , implying that the computational complexity is upper bounded by $6.5n^{2}(3.22+\log {n})$ binary operations. Furthermore, it allows parallelization leading to $O(n \log n)$ reconstruction latency. We characterize sufficient conditions for a length n binary string that guarantee the string’s reconstruction time complexity to be bounded polynomially. Moreover, the sufficient conditions on binary strings that guarantee reconstruction in polynomial time are more general than the conditions for the algorithm by Acharya et al. This is used to construct new codebooks of reconstruction codes that have efficient encoding procedures, and are larger, by at least a linear factor in size, compared to the previously best known construction by Pattabiraman et al., (2023).
一种从子字符串组合重构字符串的新代数方法
本文提出了一种基于子串组成多集的字符串重构算法。在基于聚合物的数据存储应用的激励下,从串联质谱测序中恢复字符串,所提出的算法利用了问题的等效多项式公式,从而促进了高效的并行实现。重构算法的计算复杂度以$6.5n^{2}$有限域运算为上界,其中域大小以$10n$为上界,即计算复杂度以$6.5n^{2}(3.22+\log {n})$二进制运算为上界。此外,它允许并行化导致$O(n \log n)$重建延迟。给出了长度为n的二进制字符串重构时间复杂度多项式有界的充分条件。此外,二元字符串保证在多项式时间内重构的充分条件比Acharya等算法的条件更为一般。这用于构建具有有效编码程序的重构码的新码本,并且与之前最著名的Pattabiraman等人(2023)的构建相比,至少在大小上增加了一个线性因子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory 工程技术-工程:电子与电气
CiteScore
5.70
自引率
20.00%
发文量
514
审稿时长
12 months
期刊介绍: The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信