Dna-storalator: a computational simulator for DNA data storage.

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-08-04 DOI:10.1186/s12859-025-06222-0

Gadi Chaykin, Omer Sabary, Nili Furman, Dvir Ben Shabat, Eitan Yaakobi

{"title":"Dna-storalator: a computational simulator for DNA data storage.","authors":"Gadi Chaykin, Omer Sabary, Nili Furman, Dvir Ben Shabat, Eitan Yaakobi","doi":"10.1186/s12859-025-06222-0","DOIUrl":null,"url":null,"abstract":"Background: DNA data storage is an emerging technology that caught the attention of many researchers and engineers. This technology uses DNA molecules as a storage medium and thus presents an extremely dense and durable storage device. However, the unique nature of the errors in DNA, which include insertion, deletion, and substitution errors, requires the development of new algorithmic and coding solutions for these storage systems.Results: The DNA-Storalator is a cross-platform software tool that simulates in a simplified digital point of view biological and computational processes involved in the process of storing data in DNA molecules. The simulator receives an input file with the designed DNA strands that store digital data and emulates the different biological and algorithmical components of DNA-based storage system. The biological component includes simulation of the synthesis, PCR, and sequencing stages which are expensive and complicated and therefore are not widely accessible to the community. These processes amplify the data and generate noisy copies of each DNA strand, where the errors are insertions, deletions, long-deletions, and substitutions. The DNA-Storalator injects errors to the data based on the error rates, as they vary between different synthesis and sequencing technologies. The rates are based on comprehensive analysis of data from previous experiments but can also be customized. Additionally, the tool can analyze new datasets and characterize their error rates to build new error models for future usage in the simulator. The DNA-Storalator also enables control of the amplification process and the distribution of the number of copies per designed strand. The coding and algorithmic components are: 1. Clustering algorithms which partition all output noisy strands into groups according to the designed strand they originated from; 2. State-of-the-art reconstruction algorithms that are invoked on each cluster to output a close/exact estimation of the designed strand; 3. Integration with external error-correcting codes and other encoding and decoding techniques.Conclusions: The suggested computational DNA storage simulator grants researchers from all fields an accessible complete simulator to examine new biological technologies, coding techniques, and algorithms for current and future DNA storage systems.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"204"},"PeriodicalIF":3.3000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12323093/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06222-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: DNA data storage is an emerging technology that caught the attention of many researchers and engineers. This technology uses DNA molecules as a storage medium and thus presents an extremely dense and durable storage device. However, the unique nature of the errors in DNA, which include insertion, deletion, and substitution errors, requires the development of new algorithmic and coding solutions for these storage systems.

Results: The DNA-Storalator is a cross-platform software tool that simulates in a simplified digital point of view biological and computational processes involved in the process of storing data in DNA molecules. The simulator receives an input file with the designed DNA strands that store digital data and emulates the different biological and algorithmical components of DNA-based storage system. The biological component includes simulation of the synthesis, PCR, and sequencing stages which are expensive and complicated and therefore are not widely accessible to the community. These processes amplify the data and generate noisy copies of each DNA strand, where the errors are insertions, deletions, long-deletions, and substitutions. The DNA-Storalator injects errors to the data based on the error rates, as they vary between different synthesis and sequencing technologies. The rates are based on comprehensive analysis of data from previous experiments but can also be customized. Additionally, the tool can analyze new datasets and characterize their error rates to build new error models for future usage in the simulator. The DNA-Storalator also enables control of the amplification process and the distribution of the number of copies per designed strand. The coding and algorithmic components are: 1. Clustering algorithms which partition all output noisy strands into groups according to the designed strand they originated from; 2. State-of-the-art reconstruction algorithms that are invoked on each cluster to output a close/exact estimation of the designed strand; 3. Integration with external error-correcting codes and other encoding and decoding techniques.

Conclusions: The suggested computational DNA storage simulator grants researchers from all fields an accessible complete simulator to examine new biological technologies, coding techniques, and algorithms for current and future DNA storage systems.

Abstract Image

查看原文本刊更多论文

DNA存储器：用于DNA数据存储的计算模拟器。

背景：DNA数据存储是一项新兴技术，引起了许多研究人员和工程师的注意。该技术以DNA分子为存储介质，具有极高的密度和耐用性。然而，DNA错误的独特性质，包括插入、删除和替代错误，需要为这些存储系统开发新的算法和编码解决方案。结果：DNA-存储器是一个跨平台的软件工具，以简化的数字角度模拟在DNA分子中存储数据过程中涉及的生物和计算过程。该模拟器接收带有所设计的存储数字数据的DNA链的输入文件，并模拟基于DNA的存储系统的不同生物和算法组件。生物成分包括模拟合成、PCR和测序阶段，这些阶段既昂贵又复杂，因此不能广泛地为社区所利用。这些过程放大数据并产生每条DNA链的嘈杂拷贝，其中的错误是插入、缺失、长缺失和替换。dna存储器根据错误率给数据注入错误，因为它们在不同的合成和测序技术之间是不同的。这些比率是基于对以前实验数据的综合分析，但也可以定制。此外，该工具还可以分析新的数据集并描述其错误率，以建立新的错误模型，以便将来在模拟器中使用。dna存储器还可以控制扩增过程和每条设计链的拷贝数分布。编码和算法部分是：1；聚类算法，将所有输出的噪声链根据它们的来源划分为不同的组；2. 在每个集群上调用最先进的重建算法，以输出设计链的接近/精确估计；3. 集成外部纠错码和其他编码和解码技术。结论：建议的计算DNA存储模拟器为所有领域的研究人员提供了一个可访问的完整模拟器，以检查当前和未来DNA存储系统的新生物技术，编码技术和算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.