{"title":"用于 DNA 存储的串联代码的精确误差指数","authors":"Yan Hao Ling, Jonathan Scarlett","doi":"arxiv-2409.01223","DOIUrl":null,"url":null,"abstract":"In this paper, we consider a concatenated coding based class of DNA storage\ncodes in which the selected molecules are constrained to be taken from an\n``inner'' codebook associated with the sequencing channel. This codebook is\nused in a ``black-box'' manner, and is only assumed to operate at an achievable\nrate in the sense of attaining asymptotically vanishing maximal (inner) error\nprobability. We first derive the exact error exponent in a widely-studied\nregime of constant rate and a linear number of sequencing reads, and show\nstrict improvements over an existing achievable error exponent. Moreover, our\nachievability analysis is based on a coded-index strategy, implying that such\nstrategies attain the highest error exponents within the broader class of codes\nthat we consider. We then extend our results to other scaling regimes,\nincluding a super-linear number of reads, as well as several certain low-rate\nregimes. We find that the latter comes with notable intricacies, such as the\nsuboptimality of codewords with all distinct molecules, and certain\ndependencies of the error exponents on the model for sequencing errors.","PeriodicalId":501082,"journal":{"name":"arXiv - MATH - Information Theory","volume":"37 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exact Error Exponents of Concatenated Codes for DNA Storage\",\"authors\":\"Yan Hao Ling, Jonathan Scarlett\",\"doi\":\"arxiv-2409.01223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we consider a concatenated coding based class of DNA storage\\ncodes in which the selected molecules are constrained to be taken from an\\n``inner'' codebook associated with the sequencing channel. This codebook is\\nused in a ``black-box'' manner, and is only assumed to operate at an achievable\\nrate in the sense of attaining asymptotically vanishing maximal (inner) error\\nprobability. We first derive the exact error exponent in a widely-studied\\nregime of constant rate and a linear number of sequencing reads, and show\\nstrict improvements over an existing achievable error exponent. Moreover, our\\nachievability analysis is based on a coded-index strategy, implying that such\\nstrategies attain the highest error exponents within the broader class of codes\\nthat we consider. We then extend our results to other scaling regimes,\\nincluding a super-linear number of reads, as well as several certain low-rate\\nregimes. We find that the latter comes with notable intricacies, such as the\\nsuboptimality of codewords with all distinct molecules, and certain\\ndependencies of the error exponents on the model for sequencing errors.\",\"PeriodicalId\":501082,\"journal\":{\"name\":\"arXiv - MATH - Information Theory\",\"volume\":\"37 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Information Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.01223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在本文中,我们考虑了一类基于串联编码的 DNA 存储编码,其中所选分子受限于从与测序信道相关的 "内部 "编码本中提取。该编码本以一种 "黑箱 "方式使用,并且只假定其在达到最大(内部)误差概率渐近消失的意义上运行。我们首先推导出在恒定速率和线性测序读数的广泛研究环境下的精确误差指数,并显示出与现有可实现误差指数相比的严格改进。此外,我们的可实现性分析是基于编码索引策略的,这意味着在我们考虑的更广泛的编码类别中,这种策略能实现最高的误差指数。然后,我们将结果扩展到其他扩展机制,包括超线性读取次数以及某些低速率机制。我们发现,后者也有值得注意的复杂性,例如所有不同分子的码字都是次优的,以及误差指数对测序误差模型的某些依赖性。
Exact Error Exponents of Concatenated Codes for DNA Storage
In this paper, we consider a concatenated coding based class of DNA storage
codes in which the selected molecules are constrained to be taken from an
``inner'' codebook associated with the sequencing channel. This codebook is
used in a ``black-box'' manner, and is only assumed to operate at an achievable
rate in the sense of attaining asymptotically vanishing maximal (inner) error
probability. We first derive the exact error exponent in a widely-studied
regime of constant rate and a linear number of sequencing reads, and show
strict improvements over an existing achievable error exponent. Moreover, our
achievability analysis is based on a coded-index strategy, implying that such
strategies attain the highest error exponents within the broader class of codes
that we consider. We then extend our results to other scaling regimes,
including a super-linear number of reads, as well as several certain low-rate
regimes. We find that the latter comes with notable intricacies, such as the
suboptimality of codewords with all distinct molecules, and certain
dependencies of the error exponents on the model for sequencing errors.