Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding

IF 3.7 4区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

IEEE Transactions on NanoBioscience Pub Date : 2023-06-09 DOI:10.1109/TNB.2023.3284406

Jaeho Jeong;Hosung Park;Hee-Youl Kwak;Jong-Seon No;Hahyeon Jeon;Jeong Wook Lee;Jae-Won Kim

{"title":"Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding","authors":"Jaeho Jeong;Hosung Park;Hee-Youl Kwak;Jong-Seon No;Hahyeon Jeon;Jeong Wook Lee;Jae-Won Kim","doi":"10.1109/TNB.2023.3284406","DOIUrl":null,"url":null,"abstract":"Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3%\n<inline-formula> <tex-math>$\\sim $ </tex-math></inline-formula>\n7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 1","pages":"81-90"},"PeriodicalIF":3.7000,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://ieeexplore.ieee.org/document/10147330/","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3%

$\sim $

7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.

查看原文本刊更多论文

利用质量分数和重编码的 DNA 存储迭代软解码算法

自从脱氧核糖核酸（DNA）被视为下一代数据存储介质以来，人们一直在努力研究如何利用纠错码（ECC）纠正在合成、存储和测序过程中出现的错误。以前从有错误的 DNA 测序池中恢复数据的工作采用的是基于多数决定规则的硬解码算法。为了提高 ECC 的纠错能力和 DNA 存储系统的鲁棒性，我们提出了一种新的迭代软解码算法，其中软信息来自 FASTQ 文件和信道统计数据。特别是，我们提出了一种使用质量分数（Q-scores）计算对数似然比（LLR）的新公式和一种适合 DNA 测序领域纠错和检测的重解码方法。基于 Erlich 等人提出的被广泛采用的喷泉代码结构编码方案，我们使用了三组不同的测序数据来显示性能评估的一致性。与最先进的解码方法相比，所提出的软解码算法的读数减少率提高了 2.3% ∼ 7.0%，并证明它能处理带有插入和删除错误的错误寡核苷酸测序读数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on NanoBioscience 工程技术-纳米科技

CiteScore

7.00

自引率

5.10%

发文量

197

审稿时长

>12 weeks

期刊介绍： The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).