{"title":"On the Capacity of DNA Labeling","authors":"Dganit Hanania;Daniella Bar-Lev;Yevgeni Nogin;Yoav Shechtman;Eitan Yaakobi","doi":"10.1109/TIT.2025.3545662","DOIUrl":null,"url":null,"abstract":"<italic>DNA labeling</i> is a powerful tool in molecular biology and biotechnology that allows for the visualization, detection, and study of DNA at the molecular level. Under this paradigm, a DNA molecule is being <italic>labeled</i> by specific <italic>k</i> patterns and is then imaged. Then, the resulting image is modeled as a <inline-formula> <tex-math>$(k+1)$ </tex-math></inline-formula>-ary sequence in which any non-zero symbol indicates on the appearance of the corresponding label in the DNA molecule. The primary goal of this work is to study the <italic>labeling capacity</i>, which is defined as the maximal information rate that can be obtained using this labeling process. The labeling capacity is computed for almost any pattern of a single label and several results for multiple labels are provided as well. Moreover, we provide the optimal minimal number of labels of length one or two, over any alphabet of size <italic>q</i>, that are needed in order to achieve the maximum labeling capacity of <inline-formula> <tex-math>$\\log _{2}(q)$ </tex-math></inline-formula>. Lastly, we discuss the maximal labeling capacity that can be achieved using a certain number of labels of length two.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 5","pages":"3457-3472"},"PeriodicalIF":2.2000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10910086/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
DNA labeling is a powerful tool in molecular biology and biotechnology that allows for the visualization, detection, and study of DNA at the molecular level. Under this paradigm, a DNA molecule is being labeled by specific k patterns and is then imaged. Then, the resulting image is modeled as a $(k+1)$ -ary sequence in which any non-zero symbol indicates on the appearance of the corresponding label in the DNA molecule. The primary goal of this work is to study the labeling capacity, which is defined as the maximal information rate that can be obtained using this labeling process. The labeling capacity is computed for almost any pattern of a single label and several results for multiple labels are provided as well. Moreover, we provide the optimal minimal number of labels of length one or two, over any alphabet of size q, that are needed in order to achieve the maximum labeling capacity of $\log _{2}(q)$ . Lastly, we discuss the maximal labeling capacity that can be achieved using a certain number of labels of length two.
期刊介绍:
The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.