Bee Identification Problem for DNA Strands

IEEE journal on selected areas in information theory Pub Date : 2023-01-01 DOI:10.1109/JSAIT.2023.3294423

Johan Chrisnata;Han Mao Kiah;Alexander Vardy;Eitan Yaakobi

{"title":"Bee Identification Problem for DNA Strands","authors":"Johan Chrisnata;Han Mao Kiah;Alexander Vardy;Eitan Yaakobi","doi":"10.1109/JSAIT.2023.3294423","DOIUrl":null,"url":null,"abstract":"Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all <inline-formula> <tex-math notation=\"LaTeX\">$M$ </tex-math></inline-formula> codewords from a codebook over some channel and each codeword results in <inline-formula> <tex-math notation=\"LaTeX\">$N$ </tex-math></inline-formula> noisy outputs. Then our task is to identify each codeword from this unordered set of <inline-formula> <tex-math notation=\"LaTeX\">$MN$ </tex-math></inline-formula> noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in <inline-formula> <tex-math notation=\"LaTeX\">$O(M^{3})$ </tex-math></inline-formula> time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is <inline-formula> <tex-math notation=\"LaTeX\">$o(M^{3})$ </tex-math></inline-formula>. For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"190-204"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10179132/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Motivated by DNA-based applications, we generalize the bee identification problem proposed by Tandon et al. (2019). In this setup, we transmit all

$M$

codewords from a codebook over some channel and each codeword results in

$N$

noisy outputs. Then our task is to identify each codeword from this unordered set of

$MN$

noisy outputs. First, via a reduction to a minimum-cost flow problem on a related bipartite flow network called the input-output flow network, we show that the problem can be solved in

$O(M^{3})$

time in the worst case. Next, we consider the deletion and the insertion channels individually, and in both cases, we study the expected number of edges in their respective input-output networks. Specifically, we obtain closed expressions for this quantity for certain codebooks and when the codebook comprises all binary words, we show that this quantity is sub-quadratic when the deletion or insertion probability is less than 1/2. This then implies that the expected running time to perform joint decoding for this codebook is

$o(M^{3})$

. For other codebooks, we develop methods to compute the expected number of edges efficiently. Finally, we adapt classical peeling-decoding techniques to reduce the number of nodes and edges in the input-output flow network.

查看原文本刊更多论文

蜜蜂DNA链的识别问题

受基于DNA的应用的启发，我们推广了Tandon等人提出的蜜蜂识别问题。（2019）。在这种设置中，我们在某个信道上传输来自码本的所有$M$码字，并且每个码字导致$N$噪声输出。然后，我们的任务是从这个$MN$噪声输出的无序集合中识别每个码字。首先，通过将一个相关的二分流网络（称为输入-输出流网络）上的最小成本流问题简化为一个问题，我们证明了在最坏的情况下，该问题可以在$O（M^{3}）$时间内解决。接下来，我们分别考虑删除和插入通道，在这两种情况下，我们研究了它们各自输入输出网络中的预期边缘数量。具体地，我们获得了某些码本的这个量的闭合表达式，并且当码本包括所有二进制字时，我们表明当删除或插入概率小于1/2时，这个量是次二次的。这意味着对该码本执行联合解码的预期运行时间是$o（M^{3}）$。对于其他码本，我们开发了有效计算期望边缘数量的方法。最后，我们采用经典的剥离解码技术来减少输入输出流网络中的节点和边的数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE journal on selected areas in information theory

CiteScore

8.20

自引率

0.00%

发文量