Dna-storalator: a computational simulator for DNA data storage.

IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Gadi Chaykin, Omer Sabary, Nili Furman, Dvir Ben Shabat, Eitan Yaakobi
{"title":"Dna-storalator: a computational simulator for DNA data storage.","authors":"Gadi Chaykin, Omer Sabary, Nili Furman, Dvir Ben Shabat, Eitan Yaakobi","doi":"10.1186/s12859-025-06222-0","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>DNA data storage is an emerging technology that caught the attention of many researchers and engineers. This technology uses DNA molecules as a storage medium and thus presents an extremely dense and durable storage device. However, the unique nature of the errors in DNA, which include insertion, deletion, and substitution errors, requires the development of new algorithmic and coding solutions for these storage systems.</p><p><strong>Results: </strong>The DNA-Storalator is a cross-platform software tool that simulates in a simplified digital point of view biological and computational processes involved in the process of storing data in DNA molecules. The simulator receives an input file with the designed DNA strands that store digital data and emulates the different biological and algorithmical components of DNA-based storage system. The biological component includes simulation of the synthesis, PCR, and sequencing stages which are expensive and complicated and therefore are not widely accessible to the community. These processes amplify the data and generate noisy copies of each DNA strand, where the errors are insertions, deletions, long-deletions, and substitutions. The DNA-Storalator injects errors to the data based on the error rates, as they vary between different synthesis and sequencing technologies. The rates are based on comprehensive analysis of data from previous experiments but can also be customized. Additionally, the tool can analyze new datasets and characterize their error rates to build new error models for future usage in the simulator. The DNA-Storalator also enables control of the amplification process and the distribution of the number of copies per designed strand. The coding and algorithmic components are: 1. Clustering algorithms which partition all output noisy strands into groups according to the designed strand they originated from; 2. State-of-the-art reconstruction algorithms that are invoked on each cluster to output a close/exact estimation of the designed strand; 3. Integration with external error-correcting codes and other encoding and decoding techniques.</p><p><strong>Conclusions: </strong>The suggested computational DNA storage simulator grants researchers from all fields an accessible complete simulator to examine new biological technologies, coding techniques, and algorithms for current and future DNA storage systems.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"204"},"PeriodicalIF":3.3000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12323093/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06222-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: DNA data storage is an emerging technology that caught the attention of many researchers and engineers. This technology uses DNA molecules as a storage medium and thus presents an extremely dense and durable storage device. However, the unique nature of the errors in DNA, which include insertion, deletion, and substitution errors, requires the development of new algorithmic and coding solutions for these storage systems.

Results: The DNA-Storalator is a cross-platform software tool that simulates in a simplified digital point of view biological and computational processes involved in the process of storing data in DNA molecules. The simulator receives an input file with the designed DNA strands that store digital data and emulates the different biological and algorithmical components of DNA-based storage system. The biological component includes simulation of the synthesis, PCR, and sequencing stages which are expensive and complicated and therefore are not widely accessible to the community. These processes amplify the data and generate noisy copies of each DNA strand, where the errors are insertions, deletions, long-deletions, and substitutions. The DNA-Storalator injects errors to the data based on the error rates, as they vary between different synthesis and sequencing technologies. The rates are based on comprehensive analysis of data from previous experiments but can also be customized. Additionally, the tool can analyze new datasets and characterize their error rates to build new error models for future usage in the simulator. The DNA-Storalator also enables control of the amplification process and the distribution of the number of copies per designed strand. The coding and algorithmic components are: 1. Clustering algorithms which partition all output noisy strands into groups according to the designed strand they originated from; 2. State-of-the-art reconstruction algorithms that are invoked on each cluster to output a close/exact estimation of the designed strand; 3. Integration with external error-correcting codes and other encoding and decoding techniques.

Conclusions: The suggested computational DNA storage simulator grants researchers from all fields an accessible complete simulator to examine new biological technologies, coding techniques, and algorithms for current and future DNA storage systems.

Abstract Image

Abstract Image

Abstract Image

DNA存储器:用于DNA数据存储的计算模拟器。
背景:DNA数据存储是一项新兴技术,引起了许多研究人员和工程师的注意。该技术以DNA分子为存储介质,具有极高的密度和耐用性。然而,DNA错误的独特性质,包括插入、删除和替代错误,需要为这些存储系统开发新的算法和编码解决方案。结果:DNA-存储器是一个跨平台的软件工具,以简化的数字角度模拟在DNA分子中存储数据过程中涉及的生物和计算过程。该模拟器接收带有所设计的存储数字数据的DNA链的输入文件,并模拟基于DNA的存储系统的不同生物和算法组件。生物成分包括模拟合成、PCR和测序阶段,这些阶段既昂贵又复杂,因此不能广泛地为社区所利用。这些过程放大数据并产生每条DNA链的嘈杂拷贝,其中的错误是插入、缺失、长缺失和替换。dna存储器根据错误率给数据注入错误,因为它们在不同的合成和测序技术之间是不同的。这些比率是基于对以前实验数据的综合分析,但也可以定制。此外,该工具还可以分析新的数据集并描述其错误率,以建立新的错误模型,以便将来在模拟器中使用。dna存储器还可以控制扩增过程和每条设计链的拷贝数分布。编码和算法部分是:1;聚类算法,将所有输出的噪声链根据它们的来源划分为不同的组;2. 在每个集群上调用最先进的重建算法,以输出设计链的接近/精确估计;3. 集成外部纠错码和其他编码和解码技术。结论:建议的计算DNA存储模拟器为所有领域的研究人员提供了一个可访问的完整模拟器,以检查当前和未来DNA存储系统的新生物技术,编码技术和算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信