{"title":"Neural Polar Decoders for DNA Data Storage","authors":"Ziv Aharoni;Henry D. Pfister","doi":"10.1109/JSAIT.2025.3610751","DOIUrl":null,"url":null,"abstract":"Synchronization errors, arising from both synthesis and sequencing noise, present a fundamental challenge in DNA-based data storage systems. These errors are often modeled as insertion-deletion-substitution (IDS) channels, for which maximum-likelihood decoding is quite computationally expensive. In this work, we propose a data-driven approach based on neural polar decoders (NPDs) to design decoders with reduced complexity for channels with synchronization errors. The proposed architecture enables decoding over IDS channels with reduced complexity <inline-formula> <tex-math>$O(A N \\log N)$ </tex-math></inline-formula>, where <inline-formula> <tex-math>$A$ </tex-math></inline-formula> is a tunable parameter independent of the channel. NPDs require only sample access to the channel and can be trained without an explicit channel model. Additionally, NPDs provide mutual information (MI) estimates that can be used to optimize input distributions and code design. We demonstrate the effectiveness of NPDs on both synthetic deletion and IDS channels. For deletion channels, we show that NPDs achieve near-optimal decoding performance and accurate MI estimation, with significantly lower complexity than trellis-based decoders. We also provide numerical estimates of the channel capacity for the deletion channel. We extend our evaluation to realistic DNA storage settings, including channels with multiple noisy reads and real-world Nanopore sequencing data. Our results show that NPDs match or surpass the performance of existing methods while using significantly fewer parameters than the state-of-the-art. These findings highlight the promise of NPDs for robust and efficient decoding in DNA data storage systems.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"6 ","pages":"403-416"},"PeriodicalIF":2.2000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11165383/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Synchronization errors, arising from both synthesis and sequencing noise, present a fundamental challenge in DNA-based data storage systems. These errors are often modeled as insertion-deletion-substitution (IDS) channels, for which maximum-likelihood decoding is quite computationally expensive. In this work, we propose a data-driven approach based on neural polar decoders (NPDs) to design decoders with reduced complexity for channels with synchronization errors. The proposed architecture enables decoding over IDS channels with reduced complexity $O(A N \log N)$ , where $A$ is a tunable parameter independent of the channel. NPDs require only sample access to the channel and can be trained without an explicit channel model. Additionally, NPDs provide mutual information (MI) estimates that can be used to optimize input distributions and code design. We demonstrate the effectiveness of NPDs on both synthetic deletion and IDS channels. For deletion channels, we show that NPDs achieve near-optimal decoding performance and accurate MI estimation, with significantly lower complexity than trellis-based decoders. We also provide numerical estimates of the channel capacity for the deletion channel. We extend our evaluation to realistic DNA storage settings, including channels with multiple noisy reads and real-world Nanopore sequencing data. Our results show that NPDs match or surpass the performance of existing methods while using significantly fewer parameters than the state-of-the-art. These findings highlight the promise of NPDs for robust and efficient decoding in DNA data storage systems.
由合成噪声和测序噪声引起的同步误差是基于dna的数据存储系统面临的一个基本挑战。这些错误通常被建模为插入-删除-替换(IDS)通道,对于这些通道,最大似然解码在计算上非常昂贵。在这项工作中,我们提出了一种基于神经极性解码器(npd)的数据驱动方法,用于设计具有同步错误的信道的解码器,降低了解码器的复杂性。所提出的体系结构使IDS信道上的解码具有较低的复杂度$O(A N \log N)$,其中$A$是一个独立于信道的可调参数。npd只需要访问通道的样本,并且可以在没有显式通道模型的情况下进行训练。此外,npd提供可用于优化输入分布和代码设计的互信息(MI)估计。我们证明了npd在合成缺失和IDS通道上的有效性。对于删除信道,我们表明npd实现了近乎最佳的解码性能和准确的MI估计,其复杂性明显低于基于网格的解码器。我们还提供了删除信道的信道容量的数值估计。我们将我们的评估扩展到现实的DNA存储设置,包括具有多个噪声读取的通道和真实的纳米孔测序数据。我们的研究结果表明,npd在使用比最先进的参数少得多的情况下,达到或超过了现有方法的性能。这些发现突出了npd在DNA数据存储系统中稳健和高效解码的前景。