{"title":"Permutation coding using divide-and-conquer strategy","authors":"Kun Tu, D. Puchala","doi":"10.1109/DCC55655.2023.00046","DOIUrl":null,"url":null,"abstract":"In computer science permutations are used, e.g., in the tasks of pattern searching, duplicate documents detection and data compression [1], [2]. For this reason the reduction of redundancy leading to succinct representation of permutations is of great importance. In this paper, we introduce a novel method for succinct representation of permutations where the average number of bits per element required to encode permutations is $\\log_{2}n-1.269$, which is close to the theoretic limit. Furthermore, it is possible to formulate precise expressions for the average value, lower, and upper bounds to the number of bits required by the method. Let n be an integer power of 2. Then the proposed method can be described as follows: (i) the method follows the ‘‘divide-and-conquer’’ strategy and at each stage a considered permutation is divided into two equal halves (bins), (ii) binary encoding is used to describe elements-to-bins assignment (’ 0’-first, ‘l’-second bin), (iii) depending on a permutation some bits can be omitted, which leads to succinct representation. For instance, let $\\pi_{2}=(0,2,1,3,7,6,4,5)$. We start with the identity permutation $\\pi_{1}=(0,1,2,3,4,5,6,7)$. At the first stage $\\pi_{1}$ is split between two bins in relation to $\\pi_{2}$ as $\\pi_{1}=(0,1,2,3|4,5,6,7)$ which is encoded with bits ‘0000’. At the second stage we repeat the same operations leading to $\\pi_{1}=(0,2|1,3|6,7|4,5)$, and formulate the coding bits ‘01011’ Finally, at the last stage, we get $\\pi_{1}=\\pi_{2}=(0|2|1|3|7|6|4|5)$ encoded as ‘0010’. The concatenated bits give the unique code $C=0000010110010$ for $\\pi_{2}$. The lower and upper bounds for the length of codes $\\displaystyle \\frac{1}{n}|C|$ are $G^{\\min}(n)=\\displaystyle \\frac{1}{2}\\log_{2}n$ and $G^{\\max}\\left(n\\right)=\\displaystyle \\log_{2}n-\\left(1-\\frac{1}{n}\\right)$. The average number of bits per element required to encode permutations can be calculated as:","PeriodicalId":209029,"journal":{"name":"2023 Data Compression Conference (DCC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Data Compression Conference (DCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC55655.2023.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In computer science permutations are used, e.g., in the tasks of pattern searching, duplicate documents detection and data compression [1], [2]. For this reason the reduction of redundancy leading to succinct representation of permutations is of great importance. In this paper, we introduce a novel method for succinct representation of permutations where the average number of bits per element required to encode permutations is $\log_{2}n-1.269$, which is close to the theoretic limit. Furthermore, it is possible to formulate precise expressions for the average value, lower, and upper bounds to the number of bits required by the method. Let n be an integer power of 2. Then the proposed method can be described as follows: (i) the method follows the ‘‘divide-and-conquer’’ strategy and at each stage a considered permutation is divided into two equal halves (bins), (ii) binary encoding is used to describe elements-to-bins assignment (’ 0’-first, ‘l’-second bin), (iii) depending on a permutation some bits can be omitted, which leads to succinct representation. For instance, let $\pi_{2}=(0,2,1,3,7,6,4,5)$. We start with the identity permutation $\pi_{1}=(0,1,2,3,4,5,6,7)$. At the first stage $\pi_{1}$ is split between two bins in relation to $\pi_{2}$ as $\pi_{1}=(0,1,2,3|4,5,6,7)$ which is encoded with bits ‘0000’. At the second stage we repeat the same operations leading to $\pi_{1}=(0,2|1,3|6,7|4,5)$, and formulate the coding bits ‘01011’ Finally, at the last stage, we get $\pi_{1}=\pi_{2}=(0|2|1|3|7|6|4|5)$ encoded as ‘0010’. The concatenated bits give the unique code $C=0000010110010$ for $\pi_{2}$. The lower and upper bounds for the length of codes $\displaystyle \frac{1}{n}|C|$ are $G^{\min}(n)=\displaystyle \frac{1}{2}\log_{2}n$ and $G^{\max}\left(n\right)=\displaystyle \log_{2}n-\left(1-\frac{1}{n}\right)$. The average number of bits per element required to encode permutations can be calculated as: