DNA palette code for time-series archival data storage

IF 16.3 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Zihui Yan, Haoran Zhang, Boyuan Lu, Tong Han, Xiaoguang Tong, Yingjin Yuan
{"title":"DNA palette code for time-series archival data storage","authors":"Zihui Yan, Haoran Zhang, Boyuan Lu, Tong Han, Xiaoguang Tong, Yingjin Yuan","doi":"10.1093/nsr/nwae321","DOIUrl":null,"url":null,"abstract":"The long-term preservation of large volumes of infrequently accessed cold data poses challenges to the storage community. Deoxyribonucleic Acid (DNA) is considered a promising solution due to its inherent physical stability and significant storage density. The information density and decoding sequence coverage are two important metrics that influence the efficiency of DNA data storage. In this study, we propose a novel coding scheme called DNA Palette code, which is suitable for cold data, especially time-series archival datasets. These datasets are not frequently accessed but necessitate reliable long-term storage for retrospective research. The DNA Palette code employs unordered combinations of index-free oligonucleotides (oligos) to represent binary information. It can achieve high net information density encoding and lossless decoding with low sequencing coverage. When sequencing reads are corrupted, it can still effectively recover partial information, preventing the complete failure of file retrieval. The in vivo testing of clinical brain magnetic resonance imaging (MRI) data storage, as well as simulation validations using large-scale public MRI datasets (10 GB), planetary science datasets, and meteorological datasets, demonstrate the advantages of our coding scheme, including high information density, low decoding sequence coverage, and wide applicability.","PeriodicalId":18842,"journal":{"name":"National Science Review","volume":null,"pages":null},"PeriodicalIF":16.3000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"National Science Review","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1093/nsr/nwae321","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The long-term preservation of large volumes of infrequently accessed cold data poses challenges to the storage community. Deoxyribonucleic Acid (DNA) is considered a promising solution due to its inherent physical stability and significant storage density. The information density and decoding sequence coverage are two important metrics that influence the efficiency of DNA data storage. In this study, we propose a novel coding scheme called DNA Palette code, which is suitable for cold data, especially time-series archival datasets. These datasets are not frequently accessed but necessitate reliable long-term storage for retrospective research. The DNA Palette code employs unordered combinations of index-free oligonucleotides (oligos) to represent binary information. It can achieve high net information density encoding and lossless decoding with low sequencing coverage. When sequencing reads are corrupted, it can still effectively recover partial information, preventing the complete failure of file retrieval. The in vivo testing of clinical brain magnetic resonance imaging (MRI) data storage, as well as simulation validations using large-scale public MRI datasets (10 GB), planetary science datasets, and meteorological datasets, demonstrate the advantages of our coding scheme, including high information density, low decoding sequence coverage, and wide applicability.
用于存储时间序列档案数据的 DNA 调色板代码
长期保存大量不常访问的冷数据给存储界带来了挑战。脱氧核糖核酸(DNA)因其固有的物理稳定性和巨大的存储密度而被认为是一种有前途的解决方案。信息密度和解码序列覆盖率是影响 DNA 数据存储效率的两个重要指标。在这项研究中,我们提出了一种名为 DNA 调色板代码的新型编码方案,它适用于冷数据,尤其是时间序列档案数据集。这些数据集不经常被访问,但需要可靠的长期存储,以便进行回顾性研究。DNA 调色板代码采用无索引寡核苷酸(oligos)的无序组合来表示二进制信息。它可以在低测序覆盖率的情况下实现高净信息密度编码和无损解码。当测序读数被破坏时,它仍能有效恢复部分信息,防止文件检索完全失败。临床脑磁共振成像(MRI)数据存储的活体测试,以及使用大规模公共磁共振成像数据集(10 GB)、行星科学数据集和气象数据集进行的模拟验证,证明了我们的编码方案具有高信息密度、低解码序列覆盖率和广泛适用性等优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
National Science Review
National Science Review MULTIDISCIPLINARY SCIENCES-
CiteScore
24.10
自引率
1.90%
发文量
249
审稿时长
13 weeks
期刊介绍: National Science Review (NSR; ISSN abbreviation: Natl. Sci. Rev.) is an English-language peer-reviewed multidisciplinary open-access scientific journal published by Oxford University Press under the auspices of the Chinese Academy of Sciences.According to Journal Citation Reports, its 2021 impact factor was 23.178. National Science Review publishes both review articles and perspectives as well as original research in the form of brief communications and research articles.
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信