Permutation coding using divide-and-conquer strategy

Kun Tu, D. Puchala
{"title":"Permutation coding using divide-and-conquer strategy","authors":"Kun Tu, D. Puchala","doi":"10.1109/DCC55655.2023.00046","DOIUrl":null,"url":null,"abstract":"In computer science permutations are used, e.g., in the tasks of pattern searching, duplicate documents detection and data compression [1], [2]. For this reason the reduction of redundancy leading to succinct representation of permutations is of great importance. In this paper, we introduce a novel method for succinct representation of permutations where the average number of bits per element required to encode permutations is $\\log_{2}n-1.269$, which is close to the theoretic limit. Furthermore, it is possible to formulate precise expressions for the average value, lower, and upper bounds to the number of bits required by the method. Let n be an integer power of 2. Then the proposed method can be described as follows: (i) the method follows the ‘‘divide-and-conquer’’ strategy and at each stage a considered permutation is divided into two equal halves (bins), (ii) binary encoding is used to describe elements-to-bins assignment (’ 0’-first, ‘l’-second bin), (iii) depending on a permutation some bits can be omitted, which leads to succinct representation. For instance, let $\\pi_{2}=(0,2,1,3,7,6,4,5)$. We start with the identity permutation $\\pi_{1}=(0,1,2,3,4,5,6,7)$. At the first stage $\\pi_{1}$ is split between two bins in relation to $\\pi_{2}$ as $\\pi_{1}=(0,1,2,3|4,5,6,7)$ which is encoded with bits ‘0000’. At the second stage we repeat the same operations leading to $\\pi_{1}=(0,2|1,3|6,7|4,5)$, and formulate the coding bits ‘01011’ Finally, at the last stage, we get $\\pi_{1}=\\pi_{2}=(0|2|1|3|7|6|4|5)$ encoded as ‘0010’. The concatenated bits give the unique code $C=0000010110010$ for $\\pi_{2}$. The lower and upper bounds for the length of codes $\\displaystyle \\frac{1}{n}|C|$ are $G^{\\min}(n)=\\displaystyle \\frac{1}{2}\\log_{2}n$ and $G^{\\max}\\left(n\\right)=\\displaystyle \\log_{2}n-\\left(1-\\frac{1}{n}\\right)$. The average number of bits per element required to encode permutations can be calculated as:","PeriodicalId":209029,"journal":{"name":"2023 Data Compression Conference (DCC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Data Compression Conference (DCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC55655.2023.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In computer science permutations are used, e.g., in the tasks of pattern searching, duplicate documents detection and data compression [1], [2]. For this reason the reduction of redundancy leading to succinct representation of permutations is of great importance. In this paper, we introduce a novel method for succinct representation of permutations where the average number of bits per element required to encode permutations is $\log_{2}n-1.269$, which is close to the theoretic limit. Furthermore, it is possible to formulate precise expressions for the average value, lower, and upper bounds to the number of bits required by the method. Let n be an integer power of 2. Then the proposed method can be described as follows: (i) the method follows the ‘‘divide-and-conquer’’ strategy and at each stage a considered permutation is divided into two equal halves (bins), (ii) binary encoding is used to describe elements-to-bins assignment (’ 0’-first, ‘l’-second bin), (iii) depending on a permutation some bits can be omitted, which leads to succinct representation. For instance, let $\pi_{2}=(0,2,1,3,7,6,4,5)$. We start with the identity permutation $\pi_{1}=(0,1,2,3,4,5,6,7)$. At the first stage $\pi_{1}$ is split between two bins in relation to $\pi_{2}$ as $\pi_{1}=(0,1,2,3|4,5,6,7)$ which is encoded with bits ‘0000’. At the second stage we repeat the same operations leading to $\pi_{1}=(0,2|1,3|6,7|4,5)$, and formulate the coding bits ‘01011’ Finally, at the last stage, we get $\pi_{1}=\pi_{2}=(0|2|1|3|7|6|4|5)$ encoded as ‘0010’. The concatenated bits give the unique code $C=0000010110010$ for $\pi_{2}$. The lower and upper bounds for the length of codes $\displaystyle \frac{1}{n}|C|$ are $G^{\min}(n)=\displaystyle \frac{1}{2}\log_{2}n$ and $G^{\max}\left(n\right)=\displaystyle \log_{2}n-\left(1-\frac{1}{n}\right)$. The average number of bits per element required to encode permutations can be calculated as:
采用分治策略的排列编码
在计算机科学中,排列被用于模式搜索、重复文档检测和数据压缩等任务中[1],[2]。因此,减少冗余导致排列的简洁表示是非常重要的。在本文中,我们引入了一种新的排列简洁表示方法,其中编码排列所需的每个元素的平均位数为$\log_{2}n-1.269$,接近理论极限。此外,还可以为该方法所需的位数的平均值、下界和上界制定精确的表达式。设n是2的整数次幂。然后提出的方法可以描述如下:(i)该方法遵循“分而治之”策略,在每个阶段将考虑的排列分为两个相等的一半(箱),(ii)二进制编码用于描述元素到箱的分配(' 0 ' -first, ' l ' -second bin), (iii)根据排列可以省略一些比特,这导致简洁的表示。例如,让$\pi_{2}=(0,2,1,3,7,6,4,5)$。我们从单位置换$\pi_{1}=(0,1,2,3,4,5,6,7)$开始。在第一阶段,$\pi_{1}$被分成两个相对于$\pi_{2}$的箱子,$\pi_{1}=(0,1,2,3|4,5,6,7)$用位' 0000 '编码。在第二阶段,我们重复导致$\pi_{1}=(0,2|1,3|6,7|4,5)$的相同操作,并制定编码位' 01011 '。最后,在最后阶段,我们将$\pi_{1}=\pi_{2}=(0|2|1|3|7|6|4|5)$编码为' 0010 '。连接的位给出了$\pi_{2}$的唯一代码$C=0000010110010$。编码$\displaystyle \frac{1}{n}|C|$长度的下界为$G^{\min}(n)=\displaystyle \frac{1}{2}\log_{2}n$,上界为$G^{\max}\left(n\right)=\displaystyle \log_{2}n-\left(1-\frac{1}{n}\right)$。编码排列所需的每个元素的平均位数可以计算为:
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信