Learning Structurally Stabilized Representations for Multi-modal Lossless DNA Storage

arXiv - QuanBio - Biomolecules Pub Date : 2024-07-17 DOI:arxiv-2408.00779

Ben Cao, Tiantian He, Xue Li, Bin Wang, Xiaohu Wu, Qiang Zhang, Yew-Soon Ong

{"title":"Learning Structurally Stabilized Representations for Multi-modal Lossless DNA Storage","authors":"Ben Cao, Tiantian He, Xue Li, Bin Wang, Xiaohu Wu, Qiang Zhang, Yew-Soon Ong","doi":"arxiv-2408.00779","DOIUrl":null,"url":null,"abstract":"In this paper, we present Reed-Solomon coded single-stranded representation\nlearning (RSRL), a novel end-to-end model for learning representations for\nmulti-modal lossless DNA storage. In contrast to existing learning-based\nmethods, the proposed RSRL is inspired by both error-correction codec and\nstructural biology. Specifically, RSRL first learns the representations for the\nsubsequent storage from the binary data transformed by the Reed-Solomon codec.\nThen, the representations are masked by an RS-code-informed mask to focus on\ncorrecting the burst errors occurring in the learning process. With the decoded\nrepresentations with error corrections, a novel biologically stabilized loss is\nformulated to regularize the data representations to possess stable\nsingle-stranded structures. By incorporating these novel strategies, the\nproposed RSRL can learn highly durable, dense, and lossless representations for\nthe subsequent storage tasks into DNA sequences. The proposed RSRL has been\ncompared with a number of strong baselines in real-world tasks of multi-modal\ndata storage. The experimental results obtained demonstrate that RSRL can store\ndiverse types of data with much higher information density and durability but\nmuch lower error rates.","PeriodicalId":501022,"journal":{"name":"arXiv - QuanBio - Biomolecules","volume":"104 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Biomolecules","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we present Reed-Solomon coded single-stranded representation learning (RSRL), a novel end-to-end model for learning representations for multi-modal lossless DNA storage. In contrast to existing learning-based methods, the proposed RSRL is inspired by both error-correction codec and structural biology. Specifically, RSRL first learns the representations for the subsequent storage from the binary data transformed by the Reed-Solomon codec. Then, the representations are masked by an RS-code-informed mask to focus on correcting the burst errors occurring in the learning process. With the decoded representations with error corrections, a novel biologically stabilized loss is formulated to regularize the data representations to possess stable single-stranded structures. By incorporating these novel strategies, the proposed RSRL can learn highly durable, dense, and lossless representations for the subsequent storage tasks into DNA sequences. The proposed RSRL has been compared with a number of strong baselines in real-world tasks of multi-modal data storage. The experimental results obtained demonstrate that RSRL can store diverse types of data with much higher information density and durability but much lower error rates.

查看原文本刊更多论文

学习结构稳定的表征，实现多模态无损 DNA 存储

在本文中，我们提出了里德-所罗门编码单链表征学习（RSRL），这是一种用于学习多模态无损 DNA 存储表征的新型端到端模型。与现有的基于学习的方法相比，所提出的 RSRL 同时受到纠错编解码器和结构生物学的启发。具体来说，RSRL 首先从经过里德-所罗门编解码器转换的二进制数据中学习用于后续存储的表征，然后用 RS 代码信息掩码对表征进行掩码，重点纠正学习过程中出现的突发错误。有了经过纠错的解码表征，就会形成一种新的生物稳定损失，以规范化数据表征，使其具有稳定的单链结构。通过采用这些新颖的策略，拟议的 RSRL 可以学习到高度持久、密集和无损的表征，用于随后的 DNA 序列存储任务。在现实世界的多模式数据存储任务中，所提出的 RSRL 与一些强大的基线进行了比较。实验结果表明，RSRL 能以更高的信息密度和耐用性存储多种类型的数据，但错误率却低得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - QuanBio - Biomolecules

自引率

0.00%

发文量