Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding

IF 3.7 4区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Jaeho Jeong;Hosung Park;Hee-Youl Kwak;Jong-Seon No;Hahyeon Jeon;Jeong Wook Lee;Jae-Won Kim
{"title":"Iterative Soft Decoding Algorithm for DNA Storage Using Quality Score and Redecoding","authors":"Jaeho Jeong;Hosung Park;Hee-Youl Kwak;Jong-Seon No;Hahyeon Jeon;Jeong Wook Lee;Jae-Won Kim","doi":"10.1109/TNB.2023.3284406","DOIUrl":null,"url":null,"abstract":"Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3%\n<inline-formula> <tex-math>$\\sim $ </tex-math></inline-formula>\n7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 1","pages":"81-90"},"PeriodicalIF":3.7000,"publicationDate":"2023-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://ieeexplore.ieee.org/document/10147330/","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Ever since deoxyribonucleic acid (DNA) was considered as a next-generation data-storage medium, lots of research efforts have been made to correct errors occurred during the synthesis, storage, and sequencing processes using error correcting codes (ECCs). Previous works on recovering the data from the sequenced DNA pool with errors have utilized hard decoding algorithms based on a majority decision rule. To improve the correction capability of ECCs and robustness of the DNA storage system, we propose a new iterative soft decoding algorithm, where soft information is obtained from FASTQ files and channel statistics. In particular, we propose a new formula for log-likelihood ratio (LLR) calculation using quality scores (Q-scores) and a redecoding method which may be suitable for the error correction and detection in the DNA sequencing area. Based on the widely adopted encoding scheme of the fountain code structure proposed by Erlich et al., we use three different sets of sequenced data to show consistency for the performance evaluation. The proposed soft decoding algorithm gives 2.3% $\sim $ 7.0% improvement of the reading number reduction compared to the state-of-the-art decoding method and it is shown that it can deal with erroneous sequenced oligo reads with insertion and deletion errors.
利用质量分数和重编码的 DNA 存储迭代软解码算法
自从脱氧核糖核酸(DNA)被视为下一代数据存储介质以来,人们一直在努力研究如何利用纠错码(ECC)纠正在合成、存储和测序过程中出现的错误。以前从有错误的 DNA 测序池中恢复数据的工作采用的是基于多数决定规则的硬解码算法。为了提高 ECC 的纠错能力和 DNA 存储系统的鲁棒性,我们提出了一种新的迭代软解码算法,其中软信息来自 FASTQ 文件和信道统计数据。特别是,我们提出了一种使用质量分数(Q-scores)计算对数似然比(LLR)的新公式和一种适合 DNA 测序领域纠错和检测的重解码方法。基于 Erlich 等人提出的被广泛采用的喷泉代码结构编码方案,我们使用了三组不同的测序数据来显示性能评估的一致性。与最先进的解码方法相比,所提出的软解码算法的读数减少率提高了 2.3% ∼ 7.0%,并证明它能处理带有插入和删除错误的错误寡核苷酸测序读数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on NanoBioscience
IEEE Transactions on NanoBioscience 工程技术-纳米科技
CiteScore
7.00
自引率
5.10%
发文量
197
审稿时长
>12 weeks
期刊介绍: The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信