Shu-Fang Zhang , Yu-Hui Li , Rui-Xian Zhang , Bing-Zhi Li , Qing Wang
{"title":"大规模数字DNA存储中基于引物库的多模式数据组织与文件检索","authors":"Shu-Fang Zhang , Yu-Hui Li , Rui-Xian Zhang , Bing-Zhi Li , Qing Wang","doi":"10.1016/j.eng.2023.10.021","DOIUrl":null,"url":null,"abstract":"<div><div>At present, the polymerase chain reaction (PCR) amplification-based file retrieval method is the most commonly used and effective means of DNA file retrieval. The number of orthogonal primers limits the number of files that can be accurately accessed, which in turn affects the density in a single oligo pool of digital DNA storage. In this paper, a multi-mode DNA sequence design method based on PCR file retrieval in a single oligonucleotide pool is proposed for high-capacity DNA data storage. Firstly, by analyzing the maximum number of orthogonal primers at each predicted primer length, it was found that the relationship between primer length and the maximum available primer number does not increase linearly, and the maximum number of orthogonal primers is on the order of 10<sup>4</sup>. Next, this paper analyzes the maximum address space capacity of DNA sequences with different types of primer binding sites for file mapping. In the case where the capacity of the primer library is <span><math><mrow><mi>R</mi></mrow></math></span> (where <span><math><mrow><mi>R</mi></mrow></math></span> is even), the number of address spaces that can be mapped by the single-primer DNA sequence design scheme proposed in this paper is four times that of the previous one, and the two-level primer DNA sequence design scheme can reach <span><math><mrow><msup><mrow><mfenced><mrow><mfrac><mrow><mi>ℝ</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>·</mo><mrow><mfenced><mrow><mfrac><mrow><mi>ℝ</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>-</mo><mn>1</mn></mrow></mfenced></mrow></mrow></mfenced></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span> times. Finally, a multi-mode DNA sequence generation method is designed based on the number of files to be stored in the oligonucleotide pool, in order to meet the requirements of the random retrieval of target files in an oligonucleotide pool with large-scale file numbers. The performance of the primers generated by the orthogonal primer library generator proposed in this paper is verified, and the average Gibbs free energy of the most stable heterodimer formed between the orthogonal primers produced is −1 kcal∙(mol∙L<sup>−1</sup>)<sup>−1</sup> (1 kcal = 4.184 kJ). At the same time, by selectively PCR-amplifying the DNA sequences of the two-level primer binding sites for random access, the target sequence can be accurately read with a minimum of 10<sup>3</sup> reads, when the primer binding site sequences at different positions are mutually different. This paper provides a pipeline for orthogonal primer library generation and multi-mode mapping schemes between files and primers, which can help achieve precise random access to files in large-scale DNA oligo pools.</div></div>","PeriodicalId":11783,"journal":{"name":"Engineering","volume":"48 ","pages":"Pages 151-162"},"PeriodicalIF":10.1000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Mode Data Organization and File Retrieval Based on a Primer Library in Large-Scale Digital DNA Storage\",\"authors\":\"Shu-Fang Zhang , Yu-Hui Li , Rui-Xian Zhang , Bing-Zhi Li , Qing Wang\",\"doi\":\"10.1016/j.eng.2023.10.021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>At present, the polymerase chain reaction (PCR) amplification-based file retrieval method is the most commonly used and effective means of DNA file retrieval. The number of orthogonal primers limits the number of files that can be accurately accessed, which in turn affects the density in a single oligo pool of digital DNA storage. In this paper, a multi-mode DNA sequence design method based on PCR file retrieval in a single oligonucleotide pool is proposed for high-capacity DNA data storage. Firstly, by analyzing the maximum number of orthogonal primers at each predicted primer length, it was found that the relationship between primer length and the maximum available primer number does not increase linearly, and the maximum number of orthogonal primers is on the order of 10<sup>4</sup>. Next, this paper analyzes the maximum address space capacity of DNA sequences with different types of primer binding sites for file mapping. In the case where the capacity of the primer library is <span><math><mrow><mi>R</mi></mrow></math></span> (where <span><math><mrow><mi>R</mi></mrow></math></span> is even), the number of address spaces that can be mapped by the single-primer DNA sequence design scheme proposed in this paper is four times that of the previous one, and the two-level primer DNA sequence design scheme can reach <span><math><mrow><msup><mrow><mfenced><mrow><mfrac><mrow><mi>ℝ</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>·</mo><mrow><mfenced><mrow><mfrac><mrow><mi>ℝ</mi></mrow><mrow><mn>2</mn></mrow></mfrac><mo>-</mo><mn>1</mn></mrow></mfenced></mrow></mrow></mfenced></mrow><mrow><mn>2</mn></mrow></msup></mrow></math></span> times. Finally, a multi-mode DNA sequence generation method is designed based on the number of files to be stored in the oligonucleotide pool, in order to meet the requirements of the random retrieval of target files in an oligonucleotide pool with large-scale file numbers. The performance of the primers generated by the orthogonal primer library generator proposed in this paper is verified, and the average Gibbs free energy of the most stable heterodimer formed between the orthogonal primers produced is −1 kcal∙(mol∙L<sup>−1</sup>)<sup>−1</sup> (1 kcal = 4.184 kJ). At the same time, by selectively PCR-amplifying the DNA sequences of the two-level primer binding sites for random access, the target sequence can be accurately read with a minimum of 10<sup>3</sup> reads, when the primer binding site sequences at different positions are mutually different. This paper provides a pipeline for orthogonal primer library generation and multi-mode mapping schemes between files and primers, which can help achieve precise random access to files in large-scale DNA oligo pools.</div></div>\",\"PeriodicalId\":11783,\"journal\":{\"name\":\"Engineering\",\"volume\":\"48 \",\"pages\":\"Pages 151-162\"},\"PeriodicalIF\":10.1000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2095809924006404\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2095809924006404","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
Multi-Mode Data Organization and File Retrieval Based on a Primer Library in Large-Scale Digital DNA Storage
At present, the polymerase chain reaction (PCR) amplification-based file retrieval method is the most commonly used and effective means of DNA file retrieval. The number of orthogonal primers limits the number of files that can be accurately accessed, which in turn affects the density in a single oligo pool of digital DNA storage. In this paper, a multi-mode DNA sequence design method based on PCR file retrieval in a single oligonucleotide pool is proposed for high-capacity DNA data storage. Firstly, by analyzing the maximum number of orthogonal primers at each predicted primer length, it was found that the relationship between primer length and the maximum available primer number does not increase linearly, and the maximum number of orthogonal primers is on the order of 104. Next, this paper analyzes the maximum address space capacity of DNA sequences with different types of primer binding sites for file mapping. In the case where the capacity of the primer library is (where is even), the number of address spaces that can be mapped by the single-primer DNA sequence design scheme proposed in this paper is four times that of the previous one, and the two-level primer DNA sequence design scheme can reach times. Finally, a multi-mode DNA sequence generation method is designed based on the number of files to be stored in the oligonucleotide pool, in order to meet the requirements of the random retrieval of target files in an oligonucleotide pool with large-scale file numbers. The performance of the primers generated by the orthogonal primer library generator proposed in this paper is verified, and the average Gibbs free energy of the most stable heterodimer formed between the orthogonal primers produced is −1 kcal∙(mol∙L−1)−1 (1 kcal = 4.184 kJ). At the same time, by selectively PCR-amplifying the DNA sequences of the two-level primer binding sites for random access, the target sequence can be accurately read with a minimum of 103 reads, when the primer binding site sequences at different positions are mutually different. This paper provides a pipeline for orthogonal primer library generation and multi-mode mapping schemes between files and primers, which can help achieve precise random access to files in large-scale DNA oligo pools.
期刊介绍:
Engineering, an international open-access journal initiated by the Chinese Academy of Engineering (CAE) in 2015, serves as a distinguished platform for disseminating cutting-edge advancements in engineering R&D, sharing major research outputs, and highlighting key achievements worldwide. The journal's objectives encompass reporting progress in engineering science, fostering discussions on hot topics, addressing areas of interest, challenges, and prospects in engineering development, while considering human and environmental well-being and ethics in engineering. It aims to inspire breakthroughs and innovations with profound economic and social significance, propelling them to advanced international standards and transforming them into a new productive force. Ultimately, this endeavor seeks to bring about positive changes globally, benefit humanity, and shape a new future.