{"title":"A New Algebraic Approach for String Reconstruction From Substring Compositions","authors":"Utkarsh Gupta;Hessam Mahdavifar","doi":"10.1109/TIT.2024.3493762","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new algorithm for the problem of string reconstruction from its substring composition multiset. Motivated by applications in polymer-based data storage for recovering strings from tandem mass-spectrometry sequencing, the proposed algorithm leverages the equivalent polynomial formulation of the problem which facilitates efficient parallel implementation. The computational complexity of the proposed reconstruction algorithm is upper bounded by \n<inline-formula> <tex-math>$6.5n^{2}$ </tex-math></inline-formula>\n finite field operations, where the field size is upper bounded by \n<inline-formula> <tex-math>$10n$ </tex-math></inline-formula>\n, implying that the computational complexity is upper bounded by \n<inline-formula> <tex-math>$6.5n^{2}(3.22+\\log {n})$ </tex-math></inline-formula>\n binary operations. Furthermore, it allows parallelization leading to \n<inline-formula> <tex-math>$O(n \\log n)$ </tex-math></inline-formula>\n reconstruction latency. We characterize sufficient conditions for a length n binary string that guarantee the string’s reconstruction time complexity to be bounded polynomially. Moreover, the sufficient conditions on binary strings that guarantee reconstruction in polynomial time are more general than the conditions for the algorithm by Acharya et al. This is used to construct new codebooks of reconstruction codes that have efficient encoding procedures, and are larger, by at least a linear factor in size, compared to the previously best known construction by Pattabiraman et al., (2023).","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 1","pages":"125-137"},"PeriodicalIF":2.2000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10754998/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose a new algorithm for the problem of string reconstruction from its substring composition multiset. Motivated by applications in polymer-based data storage for recovering strings from tandem mass-spectrometry sequencing, the proposed algorithm leverages the equivalent polynomial formulation of the problem which facilitates efficient parallel implementation. The computational complexity of the proposed reconstruction algorithm is upper bounded by
$6.5n^{2}$
finite field operations, where the field size is upper bounded by
$10n$
, implying that the computational complexity is upper bounded by
$6.5n^{2}(3.22+\log {n})$
binary operations. Furthermore, it allows parallelization leading to
$O(n \log n)$
reconstruction latency. We characterize sufficient conditions for a length n binary string that guarantee the string’s reconstruction time complexity to be bounded polynomially. Moreover, the sufficient conditions on binary strings that guarantee reconstruction in polynomial time are more general than the conditions for the algorithm by Acharya et al. This is used to construct new codebooks of reconstruction codes that have efficient encoding procedures, and are larger, by at least a linear factor in size, compared to the previously best known construction by Pattabiraman et al., (2023).
期刊介绍:
The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.