{"title":"基于dna的数据存储中校正单编辑代码的线性时间编码器","authors":"Y. M. Chee, H. M. Kiah, T. T. Nguyen","doi":"10.1109/ISIT.2019.8849643","DOIUrl":null,"url":null,"abstract":"An indel refers to a single insertion or deletion, while an edit refers to either a single insertion, deletion or substitution. We investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with 2⌈log n⌉ + 2 redundant bits, while the other corrects a single indel with ⌈log n⌉ + 2 redundant bits. The latter encoder reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, exactly half of the symbols of a GC-balanced word are either C or G. Via a modification of Knuth’s balancing technique, we provide a linear-time map that translates binary messages into GC-balanced codewords and the resulting codebook is able to correct a single edit. The redundancy of our encoder is 3⌈log n⌉ + 2 bits and this is the first known construction of a GC-balanced code that corrects a single edit.","PeriodicalId":6708,"journal":{"name":"2019 IEEE International Symposium on Information Theory (ISIT)","volume":"33 1","pages":"772-776"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Linear-Time Encoders for Codes Correcting a Single Edit for DNA-Based Data Storage\",\"authors\":\"Y. M. Chee, H. M. Kiah, T. T. Nguyen\",\"doi\":\"10.1109/ISIT.2019.8849643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An indel refers to a single insertion or deletion, while an edit refers to either a single insertion, deletion or substitution. We investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with 2⌈log n⌉ + 2 redundant bits, while the other corrects a single indel with ⌈log n⌉ + 2 redundant bits. The latter encoder reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, exactly half of the symbols of a GC-balanced word are either C or G. Via a modification of Knuth’s balancing technique, we provide a linear-time map that translates binary messages into GC-balanced codewords and the resulting codebook is able to correct a single edit. The redundancy of our encoder is 3⌈log n⌉ + 2 bits and this is the first known construction of a GC-balanced code that corrects a single edit.\",\"PeriodicalId\":6708,\"journal\":{\"name\":\"2019 IEEE International Symposium on Information Theory (ISIT)\",\"volume\":\"33 1\",\"pages\":\"772-776\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Symposium on Information Theory (ISIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISIT.2019.8849643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Information Theory (ISIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIT.2019.8849643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Linear-Time Encoders for Codes Correcting a Single Edit for DNA-Based Data Storage
An indel refers to a single insertion or deletion, while an edit refers to either a single insertion, deletion or substitution. We investigate codes that combat either a single indel or a single edit and provide linear-time algorithms that encode binary messages into these codes of length n. Over the quaternary alphabet, we provide two linear-time encoders. One corrects a single edit with 2⌈log n⌉ + 2 redundant bits, while the other corrects a single indel with ⌈log n⌉ + 2 redundant bits. The latter encoder reduces the redundancy of the best known encoder of Tenengolts (1984) by at least four bits. Over the DNA alphabet, exactly half of the symbols of a GC-balanced word are either C or G. Via a modification of Knuth’s balancing technique, we provide a linear-time map that translates binary messages into GC-balanced codewords and the resulting codebook is able to correct a single edit. The redundancy of our encoder is 3⌈log n⌉ + 2 bits and this is the first known construction of a GC-balanced code that corrects a single edit.