{"title":"Effective IDS Error Correction Algorithms for DNA Storage Channels with Multiple Output Sequences.","authors":"Caiyun Deng, Guojun Han, Pengchao Han, Yi Fang","doi":"10.1109/TNB.2025.3558853","DOIUrl":null,"url":null,"abstract":"<p><p>DNA data storage is a cutting-edge storage technique due to its high density, replicability, and long-term capability. It involves encoding, insertion, deletion, and substitution (IDS) channels for data synthesis and sequencing, and decoding processes. The IDS channels that feature multiple output sequences are prone to IDS errors, complicating the decoding process and degrading the performance of DNA data storage. To address this issue, we investigate effective IDS error correction algorithms considering two encoding schemes in DNA data storage. Specifically in the encoding process, we use marker codes (MC) and embedded marker codes (EMC) as inner codes, respectively, both connected to low-density parity-check (LDPC) codes as outer codes. First, we propose the segmented progressive matching (SPM) algorithm to infer the consensus sequence from multiple output sequences, thereby facilitating the decoding processes. Moreover, when using MC as the inner code, we propose a synchronous decoding algorithm based on the Hidden Markov Model (SDH) to infer the a posteriori probability (APP) of base symbols, which supports the external decoding algorithm. Furthermore, when the inner code is EMC, we propose the iterative external decoding (IED) algorithm. IED integrates synchronous decoding with embedded normalized min-sum decoding (ENMS) to achieve an enhanced APP for external decoding, enabling lower bit-error rate (BER) transmission. Meanwhile, we reduce the complexity of the external decoder by minimizing checksum node computations. Comparing the two schemes reveals that the SDH algorithm with MC as the inner code offers a lightweight solution for DNA data storage. In contrast, the IED with EMC demonstrates superior decoding performance with a linear complexity scale by the number of iterations. Compared with existing studies, simulation results show that our proposed decoding algorithm reduces the BER by 21.72% ~ 99.75%.</p>","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"PP ","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1109/TNB.2025.3558853","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
DNA data storage is a cutting-edge storage technique due to its high density, replicability, and long-term capability. It involves encoding, insertion, deletion, and substitution (IDS) channels for data synthesis and sequencing, and decoding processes. The IDS channels that feature multiple output sequences are prone to IDS errors, complicating the decoding process and degrading the performance of DNA data storage. To address this issue, we investigate effective IDS error correction algorithms considering two encoding schemes in DNA data storage. Specifically in the encoding process, we use marker codes (MC) and embedded marker codes (EMC) as inner codes, respectively, both connected to low-density parity-check (LDPC) codes as outer codes. First, we propose the segmented progressive matching (SPM) algorithm to infer the consensus sequence from multiple output sequences, thereby facilitating the decoding processes. Moreover, when using MC as the inner code, we propose a synchronous decoding algorithm based on the Hidden Markov Model (SDH) to infer the a posteriori probability (APP) of base symbols, which supports the external decoding algorithm. Furthermore, when the inner code is EMC, we propose the iterative external decoding (IED) algorithm. IED integrates synchronous decoding with embedded normalized min-sum decoding (ENMS) to achieve an enhanced APP for external decoding, enabling lower bit-error rate (BER) transmission. Meanwhile, we reduce the complexity of the external decoder by minimizing checksum node computations. Comparing the two schemes reveals that the SDH algorithm with MC as the inner code offers a lightweight solution for DNA data storage. In contrast, the IED with EMC demonstrates superior decoding performance with a linear complexity scale by the number of iterations. Compared with existing studies, simulation results show that our proposed decoding algorithm reduces the BER by 21.72% ~ 99.75%.
期刊介绍:
The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).