{"title":"Geno-Weaving: A Framework for Low-Complexity Capacity-Achieving DNA Data Storage","authors":"Hsin-Po Wang;Venkatesan Guruswami","doi":"10.1109/JSAIT.2025.3610643","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3610643","url":null,"abstract":"As a potential implementation of data storage using DNA molecules, multiple strands of DNA are stored unordered in a liquid container. When the data are needed, an array of DNA readers will sample the strands with replacement, producing a Poisson-distributed number of noisy reads for each strand. The primary challenge here is to design an algorithm that reconstructs data from these unsorted, repetitive, and noisy reads. In this paper, we lay down a capacity-achieving rateless code along each strand to encode its index; we then lay down a capacity-achieving block code at the same position across all strands to protect the data. These codes weave a low-complexity storage scheme that saturates the fundamental upper limit of DNA. This improves upon the previous work of Weinberger and Merhav, which proves said bound and uses high-complexity random codes to saturate the limit. Our scheme also differs from other concatenation-based implementations of DNA data storage in the sense that, instead of decoding the inner codes first and passing the results to the outer code, our decoder alternates between the rateless codes and the block codes.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"383-393"},"PeriodicalIF":2.2,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145315409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tailoring Fault-Tolerance to Quantum Algorithms","authors":"Zhuangzhuang Chen;Narayanan Rengaswamy","doi":"10.1109/JSAIT.2025.3602446","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3602446","url":null,"abstract":"The standard approach to universal fault-tolerant quantum computing is to develop a general purpose quantum error correction mechanism that can implement a universal set of logical gates fault-tolerantly. Given such a scheme, any quantum algorithm can be realized fault-tolerantly by composing the relevant logical gates from this set. However, we know that quantum computers provide a significant quantum advantage only for specific quantum algorithms. Hence, a universal quantum computer can likely gain from compiling such specific algorithms using tailored quantum error correction schemes. In this work, we take the first steps towards such algorithm-tailored quantum fault-tolerance. We consider Trotter circuits in quantum simulation, which is an important application of quantum computing. We develop a solve-and-stitch algorithm to systematically synthesize physical realizations of Clifford Trotter circuits on the well-known <inline-formula> <tex-math>$[![n,n-2,2]!]$ </tex-math></inline-formula> error-detecting code family. Our analysis shows that this family implements Trotter circuits with essentially optimal depth under reasonable assumptions, thereby serving as an illuminating example of tailored quantum error correction. We achieve fault-tolerance for these circuits using flag gadgets, which add minimal overhead. Importantly, the solve-and-stitch algorithm has the potential to scale beyond this specific example, as illustrated by a generalization to the four-qubit logical Clifford Trotter circuit on the <inline-formula> <tex-math>$[![{ 20,4,2 }]!] $ </tex-math></inline-formula> hypergraph product code, thereby providing a principled approach to tailored fault-tolerance in quantum computing.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"311-324"},"PeriodicalIF":2.2,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Codeword Stabilized Codes From m-Uniform Graph States","authors":"Sowrabh Sudevan;Sourin Das;Thamadathil Aswanth;Nupur Patanker;Navin Kashyap","doi":"10.1109/JSAIT.2025.3602744","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3602744","url":null,"abstract":"An m-uniform quantum state on n qubits is an entangled state in which every m-qubit subsystem is maximally mixed. Starting with an m-uniform state realized as the graph state associated with an m-regular graph, and a classical <inline-formula> <tex-math>$[n,k,d ge m+1]$ </tex-math></inline-formula> binary linear code with certain additional properties, we show that pure <inline-formula> <tex-math>$[[n,k,m+1]]_{2}$ </tex-math></inline-formula> quantum error-correcting codes (QECCs) can be constructed within the codeword stabilized (CWS) code framework. As illustrations, we construct pure <inline-formula> <tex-math>$[[{2^{2r}-1,2^{2r}-2r-3,3}]]_{2}$ </tex-math></inline-formula> and <inline-formula> <tex-math>$[[(2^{4r}-1)^{2}, (2^{4r}-1)^{2} - 32r-7, 5]]_{2}$ </tex-math></inline-formula> QECCs. We also give measurement-based protocols for encoding into code states and for recovery of logical qubits from code states.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"296-310"},"PeriodicalIF":2.2,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shadow Area and Degrees of Freedom for Free-Space Communication","authors":"Mats Gustafsson","doi":"10.1109/JSAIT.2025.3600363","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3600363","url":null,"abstract":"The number of degrees of freedom (NDoF) in a communication channel fundamentally limits the number of independent spatial modes available for transmitting and receiving information. Although the NDoF can be computed numerically for specific configurations using singular value decomposition (SVD) of the channel operator, this approach provides limited physical insight. In this paper, we introduce a simple analytical estimate for the NDoF between arbitrarily shaped transmitter and receiver regions in free space. In the electrically large limit, where the NDoF is high, it is well approximated by the mutual shadow area, measured in units of wavelength squared. This area corresponds to the projected overlap of the regions, integrated over all lines of sight, and captures their effective spatial coupling. The proposed estimate generalizes and unifies several previously established results, including those based on Weyl’s law, shadow area, and the paraxial approximation. We analyze several example configurations to illustrate the accuracy of the estimate and validate it through comparisons with numerical SVD computations of the propagation channel. The results provide both practical tools and physical insight for the design and analysis of high-capacity communication and sensing systems.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"325-337"},"PeriodicalIF":2.2,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145110296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Achievable Rates and Error Probability Bounds of Frequency-Based Channels of Unlimited Input Resolution","authors":"Ran Tamir;Nir Weinberger","doi":"10.1109/JSAIT.2025.3599794","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3599794","url":null,"abstract":"We consider a molecular channel, in which messages are encoded to the frequency of objects in a pool, and whose output during reading time is a noisy version of the input frequencies, as obtained by sampling with replacement from the pool. Motivated by recent DNA storage techniques, we focus on the regime in which the input resolution is unlimited. We propose two error probability bounds for this channel; the first bound is based on random coding analysis of the error probability of the maximum likelihood decoder and the second bound is derived by code expurgation techniques. We deduce an achievable bound on the capacity of this channel, and compare it to both the achievable bounds under limited input resolution, as well as to a converse bound.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"283-295"},"PeriodicalIF":2.2,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Achievable Rates of Nanopore-Based DNA Storage","authors":"Brendon McBain;Emanuele Viterbo","doi":"10.1109/JSAIT.2025.3598756","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3598756","url":null,"abstract":"This paper studies achievable rates of nanopore-based DNA storage when nanopore signals are decoded using a tractable channel model that does not rely on a basecalling algorithm. Specifically, the noisy nanopore channel (NNC) with the Scrappie pore model generates average output levels via i.i.d. geometric sample duplications corrupted by i.i.d. Gaussian noise (NNC-Scrappie). Simplified message passing algorithms are derived for efficient soft decoding of nanopore signals using NNC-Scrappie. Previously, evaluation of this channel model was limited by the lack of DNA storage datasets with nanopore signals included. This is solved by deriving an achievable rate based on the dynamic time-warping (DTW) algorithm that can be applied to genomic sequencing datasets subject to constraints that make the resulting rate applicable to DNA storage. Using a publicly-available dataset from Oxford Nanopore Technologies (ONT), it is demonstrated that coding over multiple DNA strands of 100 bases in length and decoding with the NNC-Scrappie decoder can achieve rates of at least <inline-formula> <tex-math>$0.64-1.18$ </tex-math></inline-formula> bits per base, depending on the channel quality of the nanopore that is chosen in the sequencing device per channel-use, and 0.96 bits per base on average assuming uniformly chosen nanopores. These rates are pessimistic since they only apply to single reads and do not include calibration of the pore model to specific nanopores.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"261-269"},"PeriodicalIF":2.2,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Achievable Rates Over Noisy Nanopore Channels","authors":"V. Arvind Rameshwar;Nir Weinberger","doi":"10.1109/JSAIT.2025.3598773","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3598773","url":null,"abstract":"In this paper, we consider a recent channel model of a nanopore sequencer proposed by McBain, Viterbo, and Saunderson (2024), termed the noisy nanopore channel (NNC). In essence, an NNC is a duplication channel with structured, Markov inputs, that is corrupted by memoryless noise. We first discuss a (tight) lower bound on the capacity of the NNC in the absence of random noise. Next, we present lower and upper bounds on the channel capacity of general noisy nanopore channels. We then consider two interesting regimes of operation of an NNC: first, where the memory of the input process is large and the random noise introduces erasures, and second, where the rate of measurements of the electric current (also called the sampling rate) is high. For these regimes, we show that it is possible to achieve information rates close to the noise-free capacity, using low-complexity encoding and decoding schemes. In particular, our decoder for the regime of high sampling rates makes use of a change-point detection procedure – a subroutine of immediate relevance for practitioners.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"270-282"},"PeriodicalIF":2.2,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144990072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequence Reconstruction for the Single-Deletion Single-Substitution Channel","authors":"Wentu Song;Kui Cai;Tony Q. S. Quek","doi":"10.1109/JSAIT.2025.3597013","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3597013","url":null,"abstract":"The central problem in sequence reconstruction is to find the minimum number of distinct channel outputs required to uniquely reconstruct the transmitted sequence. According to Levenshtein’s work in 2001, this number is determined by the size of the maximum intersection between the error balls of any two distinct input sequences of the channel. In this work, we study the sequence reconstruction problem for the q-ary single-deletion single-substitution channel for any fixed integer <inline-formula> <tex-math>$qgeq 2$ </tex-math></inline-formula>. First, we prove that if two q-ary sequences of length n have a Hamming distance <inline-formula> <tex-math>$dgeq 2$ </tex-math></inline-formula>, then the intersection size of their error balls is upper bounded by <inline-formula> <tex-math>$2qn-3q-2-delta _{q,2}$ </tex-math></inline-formula>, where <inline-formula> <tex-math>$delta _{i,j}$ </tex-math></inline-formula> is the Kronecker delta, and this bound is achievable. Next, we prove that if two q-ary sequences have a Hamming distance <inline-formula> <tex-math>$dgeq 3$ </tex-math></inline-formula> and a Levenshtein distance <inline-formula> <tex-math>$d_{text {L}}geq 2$ </tex-math></inline-formula>, then the intersection size of their error balls is upper bounded by <inline-formula> <tex-math>$3q+11$ </tex-math></inline-formula>, and we show that the gap between this bound and the tight bound is at most 2.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"232-247"},"PeriodicalIF":2.2,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144914375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Survey of Sequence Reconstruction Problems and Their Applications in DNA-Based Storage","authors":"Yaoyu Yang","doi":"10.1109/JSAIT.2025.3595457","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3595457","url":null,"abstract":"In DNA sequencing, we often need to infer an unknown sequence from a collection of its corrupted copies. Each copy cannot faithfully tell the truth due to DNA fragmentation, point mutations, and measurement errors. The theoretical guarantee of unique reconstruction is thus of concern. This motivated the study of sequence reconstruction problems three decades ago. Recently, synthetic DNA has been regarded as an ultra-dense data storage medium. Sequence reconstruction is a crucial step in achieving reliable and efficient data readout. In this survey, we summarize mainly two types of problems, reconstruction from subsequences or substrings, in both combinatorial and probabilistic settings. Meanwhile, we discuss codes and algorithms that may assist with the future development of DNA-based data storage systems.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"352-366"},"PeriodicalIF":2.2,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Input Optimization in the Composite DNA Storage Channel","authors":"Adir Kobovich;Nir Weinberger","doi":"10.1109/JSAIT.2025.3595005","DOIUrl":"https://doi.org/10.1109/JSAIT.2025.3595005","url":null,"abstract":"Recent advancements in DNA storage show that composite DNA letters can significantly enhance storage capacity. We model this process as a multinomial channel and propose an optimization algorithm to determine its capacity-achieving input distribution (CAID) for an arbitrary number of output reads. Our empirical results match a scaling law that determines that the support size grows exponentially with capacity. In addition, we introduce a limited-support optimization algorithm that optimizes the input distribution under a restricted support size, making it more feasible for real-world DNA storage systems. We also extend our model to account for noise and study its effect on capacity and input design.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"248-260"},"PeriodicalIF":2.2,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144998191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}