Effective IDS Error Correction Algorithms for DNA Storage Channels with Multiple Output Sequences.

IF 3.7 4区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Caiyun Deng, Guojun Han, Pengchao Han, Yi Fang
{"title":"Effective IDS Error Correction Algorithms for DNA Storage Channels with Multiple Output Sequences.","authors":"Caiyun Deng, Guojun Han, Pengchao Han, Yi Fang","doi":"10.1109/TNB.2025.3558853","DOIUrl":null,"url":null,"abstract":"<p><p>DNA data storage is a cutting-edge storage technique due to its high density, replicability, and long-term capability. It involves encoding, insertion, deletion, and substitution (IDS) channels for data synthesis and sequencing, and decoding processes. The IDS channels that feature multiple output sequences are prone to IDS errors, complicating the decoding process and degrading the performance of DNA data storage. To address this issue, we investigate effective IDS error correction algorithms considering two encoding schemes in DNA data storage. Specifically in the encoding process, we use marker codes (MC) and embedded marker codes (EMC) as inner codes, respectively, both connected to low-density parity-check (LDPC) codes as outer codes. First, we propose the segmented progressive matching (SPM) algorithm to infer the consensus sequence from multiple output sequences, thereby facilitating the decoding processes. Moreover, when using MC as the inner code, we propose a synchronous decoding algorithm based on the Hidden Markov Model (SDH) to infer the a posteriori probability (APP) of base symbols, which supports the external decoding algorithm. Furthermore, when the inner code is EMC, we propose the iterative external decoding (IED) algorithm. IED integrates synchronous decoding with embedded normalized min-sum decoding (ENMS) to achieve an enhanced APP for external decoding, enabling lower bit-error rate (BER) transmission. Meanwhile, we reduce the complexity of the external decoder by minimizing checksum node computations. Comparing the two schemes reveals that the SDH algorithm with MC as the inner code offers a lightweight solution for DNA data storage. In contrast, the IED with EMC demonstrates superior decoding performance with a linear complexity scale by the number of iterations. Compared with existing studies, simulation results show that our proposed decoding algorithm reduces the BER by 21.72% ~ 99.75%.</p>","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"PP ","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1109/TNB.2025.3558853","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

DNA data storage is a cutting-edge storage technique due to its high density, replicability, and long-term capability. It involves encoding, insertion, deletion, and substitution (IDS) channels for data synthesis and sequencing, and decoding processes. The IDS channels that feature multiple output sequences are prone to IDS errors, complicating the decoding process and degrading the performance of DNA data storage. To address this issue, we investigate effective IDS error correction algorithms considering two encoding schemes in DNA data storage. Specifically in the encoding process, we use marker codes (MC) and embedded marker codes (EMC) as inner codes, respectively, both connected to low-density parity-check (LDPC) codes as outer codes. First, we propose the segmented progressive matching (SPM) algorithm to infer the consensus sequence from multiple output sequences, thereby facilitating the decoding processes. Moreover, when using MC as the inner code, we propose a synchronous decoding algorithm based on the Hidden Markov Model (SDH) to infer the a posteriori probability (APP) of base symbols, which supports the external decoding algorithm. Furthermore, when the inner code is EMC, we propose the iterative external decoding (IED) algorithm. IED integrates synchronous decoding with embedded normalized min-sum decoding (ENMS) to achieve an enhanced APP for external decoding, enabling lower bit-error rate (BER) transmission. Meanwhile, we reduce the complexity of the external decoder by minimizing checksum node computations. Comparing the two schemes reveals that the SDH algorithm with MC as the inner code offers a lightweight solution for DNA data storage. In contrast, the IED with EMC demonstrates superior decoding performance with a linear complexity scale by the number of iterations. Compared with existing studies, simulation results show that our proposed decoding algorithm reduces the BER by 21.72% ~ 99.75%.

DNA 数据存储因其高密度、可复制性和长期能力而成为一种尖端存储技术。它包括用于数据合成和测序的编码、插入、删除和替换(IDS)通道以及解码过程。具有多个输出序列的 IDS 通道容易出现 IDS 错误,从而使解码过程复杂化,并降低 DNA 数据存储的性能。针对这一问题,我们研究了有效的 IDS 纠错算法,其中考虑了 DNA 数据存储中的两种编码方案。具体来说,在编码过程中,我们分别使用标记码(MC)和嵌入标记码(EMC)作为内码,两者都与低密度奇偶校验码(LDPC)相连作为外码。首先,我们提出了分段渐进匹配(SPM)算法,从多个输出序列中推断出共识序列,从而简化了解码过程。此外,当使用 MC 作为内码时,我们提出了一种基于隐马尔可夫模型(SDH)的同步解码算法来推断基本符号的后验概率(APP),从而支持外部解码算法。此外,当内码为 EMC 时,我们提出了迭代外部解码(IED)算法。IED 将同步解码与嵌入式归一化最小和解码(ENMS)相结合,实现了外部解码的增强型 APP,从而实现了更低的误码率(BER)传输。同时,我们通过最大限度地减少校验和节点计算,降低了外部解码器的复杂性。比较这两种方案可以发现,以 MC 作为内码的 SDH 算法为 DNA 数据存储提供了一种轻量级解决方案。相比之下,以 EMC 为内码的 IED 则表现出更优越的解码性能,其复杂度与迭代次数成线性比例。与现有研究相比,仿真结果表明,我们提出的解码算法将误码率降低了 21.72% ~ 99.75%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on NanoBioscience
IEEE Transactions on NanoBioscience 工程技术-纳米科技
CiteScore
7.00
自引率
5.10%
发文量
197
审稿时长
>12 weeks
期刊介绍: The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信