{"title":"基于LZW和算术编码的DNA联合编码方法","authors":"Zhongyang Cheng;Qiang Liu;Kun Yang","doi":"10.1109/TMBMC.2025.3556858","DOIUrl":null,"url":null,"abstract":"Molecular communication (MC) represents a novel approach to communication that employs nanoengineering and bioengineering technology to establish transient communication links in challenging environments. Deoxyribonucleic acid (DNA) molecular communication can transmit more and faster data than traditional molecular communication. Deoxyribonucleic acid (DNA) has been demonstrated to offer significant advantages over traditional information carriers, including its excellent storage density and structural stability, which renders it an ideal medium for information transmission. It is therefore imperative to investigate methods of increasing the data information density of DNA in order to reduce costs and enhance overall performance. LZW encoding is Lempel-Ziv–Welch encoding which creates a string table with shorter codes representing longer strings. Arithmetic coding is a compression process that involves the continuous refinement of probabilities of the input stream within an interval. A notable drawback of LZW coding is its suboptimal compression efficiency and the presence of data redundancy after dictionary mapping. Conversely, arithmetic coding attains compression efficiency that approaches the Shannon limit. In this study, we propose a novel DNA encoding method which is capable of adaptively generating coding streams in accordance with the characteristics of the stored content. The contribution of this paper is as follows: 1) A bespoke coding dictionary is constructed, which is capable of intelligently generating the corresponding coding stream in accordance with the specific characteristics of the file to be stored. 2) Utilising arithmetic coding techniques, these coding streams are converted into the final DNA sequence by means of compression techniques. Following comprehensive verification, it has been established that the information density of this encoding method is markedly superior to that of the prevailing mainstream encoding schemes.","PeriodicalId":36530,"journal":{"name":"IEEE Transactions on Molecular, Biological, and Multi-Scale Communications","volume":"11 2","pages":"237-245"},"PeriodicalIF":2.3000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Joint DNA Encoding Approach Based on LZW and Arithmetic Encoding\",\"authors\":\"Zhongyang Cheng;Qiang Liu;Kun Yang\",\"doi\":\"10.1109/TMBMC.2025.3556858\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Molecular communication (MC) represents a novel approach to communication that employs nanoengineering and bioengineering technology to establish transient communication links in challenging environments. Deoxyribonucleic acid (DNA) molecular communication can transmit more and faster data than traditional molecular communication. Deoxyribonucleic acid (DNA) has been demonstrated to offer significant advantages over traditional information carriers, including its excellent storage density and structural stability, which renders it an ideal medium for information transmission. It is therefore imperative to investigate methods of increasing the data information density of DNA in order to reduce costs and enhance overall performance. LZW encoding is Lempel-Ziv–Welch encoding which creates a string table with shorter codes representing longer strings. Arithmetic coding is a compression process that involves the continuous refinement of probabilities of the input stream within an interval. A notable drawback of LZW coding is its suboptimal compression efficiency and the presence of data redundancy after dictionary mapping. Conversely, arithmetic coding attains compression efficiency that approaches the Shannon limit. In this study, we propose a novel DNA encoding method which is capable of adaptively generating coding streams in accordance with the characteristics of the stored content. The contribution of this paper is as follows: 1) A bespoke coding dictionary is constructed, which is capable of intelligently generating the corresponding coding stream in accordance with the specific characteristics of the file to be stored. 2) Utilising arithmetic coding techniques, these coding streams are converted into the final DNA sequence by means of compression techniques. Following comprehensive verification, it has been established that the information density of this encoding method is markedly superior to that of the prevailing mainstream encoding schemes.\",\"PeriodicalId\":36530,\"journal\":{\"name\":\"IEEE Transactions on Molecular, Biological, and Multi-Scale Communications\",\"volume\":\"11 2\",\"pages\":\"237-245\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Molecular, Biological, and Multi-Scale Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10948464/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Molecular, Biological, and Multi-Scale Communications","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10948464/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
A Joint DNA Encoding Approach Based on LZW and Arithmetic Encoding
Molecular communication (MC) represents a novel approach to communication that employs nanoengineering and bioengineering technology to establish transient communication links in challenging environments. Deoxyribonucleic acid (DNA) molecular communication can transmit more and faster data than traditional molecular communication. Deoxyribonucleic acid (DNA) has been demonstrated to offer significant advantages over traditional information carriers, including its excellent storage density and structural stability, which renders it an ideal medium for information transmission. It is therefore imperative to investigate methods of increasing the data information density of DNA in order to reduce costs and enhance overall performance. LZW encoding is Lempel-Ziv–Welch encoding which creates a string table with shorter codes representing longer strings. Arithmetic coding is a compression process that involves the continuous refinement of probabilities of the input stream within an interval. A notable drawback of LZW coding is its suboptimal compression efficiency and the presence of data redundancy after dictionary mapping. Conversely, arithmetic coding attains compression efficiency that approaches the Shannon limit. In this study, we propose a novel DNA encoding method which is capable of adaptively generating coding streams in accordance with the characteristics of the stored content. The contribution of this paper is as follows: 1) A bespoke coding dictionary is constructed, which is capable of intelligently generating the corresponding coding stream in accordance with the specific characteristics of the file to be stored. 2) Utilising arithmetic coding techniques, these coding streams are converted into the final DNA sequence by means of compression techniques. Following comprehensive verification, it has been established that the information density of this encoding method is markedly superior to that of the prevailing mainstream encoding schemes.
期刊介绍:
As a result of recent advances in MEMS/NEMS and systems biology, as well as the emergence of synthetic bacteria and lab/process-on-a-chip techniques, it is now possible to design chemical “circuits”, custom organisms, micro/nanoscale swarms of devices, and a host of other new systems. This success opens up a new frontier for interdisciplinary communications techniques using chemistry, biology, and other principles that have not been considered in the communications literature. The IEEE Transactions on Molecular, Biological, and Multi-Scale Communications (T-MBMSC) is devoted to the principles, design, and analysis of communication systems that use physics beyond classical electromagnetism. This includes molecular, quantum, and other physical, chemical and biological techniques; as well as new communication techniques at small scales or across multiple scales (e.g., nano to micro to macro; note that strictly nanoscale systems, 1-100 nm, are outside the scope of this journal). Original research articles on one or more of the following topics are within scope: mathematical modeling, information/communication and network theoretic analysis, standardization and industrial applications, and analytical or experimental studies on communication processes or networks in biology. Contributions on related topics may also be considered for publication. Contributions from researchers outside the IEEE’s typical audience are encouraged.