{"title":"On the Long-Term Behavior of k-Tuples Frequencies in Mutation Systems","authors":"Ohad Elishco","doi":"10.1109/TIT.2024.3456597","DOIUrl":null,"url":null,"abstract":"In response to the evolving landscape of data storage, researchers have increasingly explored non-traditional platforms, with DNA-based storage emerging as a cutting-edge solution. Our work is motivated by the potential of in-vivo DNA storage, known for its capacity to store vast amounts of information efficiently and confidentially within an organism’s native DNA. While promising, in-vivo DNA storage faces challenges, including susceptibility to errors introduced by mutations. One way to understand the long-term effect of such mutations on the stored information is to investigate the frequency of k-tuples after multiple mutations. Drawing inspiration from related works, we generalize results from the study of duplication systems, particularly focusing on the frequency (or proportion) of k-tuples. We provide a general method for the analysis of mutation systems through the construction of a specialized matrix, dubbed substitution matrix, and the identification of its eigenvectors. Specifically, we derive an expression for the expected frequency of k-tuples. In the context of duplication errors, we leverage existing results on the almost sure convergence of the frequency of k-tuples. This allows us to equate the expected frequency of k-tuples to the limiting frequency of k-tuples. In addition, we demonstrate the convergence in probability of the frequency of k-tuples under certain assumptions.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"70 12","pages":"8524-8545"},"PeriodicalIF":2.2000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10670057/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In response to the evolving landscape of data storage, researchers have increasingly explored non-traditional platforms, with DNA-based storage emerging as a cutting-edge solution. Our work is motivated by the potential of in-vivo DNA storage, known for its capacity to store vast amounts of information efficiently and confidentially within an organism’s native DNA. While promising, in-vivo DNA storage faces challenges, including susceptibility to errors introduced by mutations. One way to understand the long-term effect of such mutations on the stored information is to investigate the frequency of k-tuples after multiple mutations. Drawing inspiration from related works, we generalize results from the study of duplication systems, particularly focusing on the frequency (or proportion) of k-tuples. We provide a general method for the analysis of mutation systems through the construction of a specialized matrix, dubbed substitution matrix, and the identification of its eigenvectors. Specifically, we derive an expression for the expected frequency of k-tuples. In the context of duplication errors, we leverage existing results on the almost sure convergence of the frequency of k-tuples. This allows us to equate the expected frequency of k-tuples to the limiting frequency of k-tuples. In addition, we demonstrate the convergence in probability of the frequency of k-tuples under certain assumptions.
期刊介绍:
The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.