{"title":"L-systems for Measuring Repetitiveness","authors":"G. Navarro, Cristian Urbina","doi":"10.48550/arXiv.2206.01688","DOIUrl":null,"url":null,"abstract":"An L-system (for lossless compression) is a CPD0L-system extended with two parameters $d$ and $n$, which determines unambiguously a string $w = \\tau(\\varphi^d(s))[1:n]$, where $\\varphi$ is the morphism of the system, $s$ is its axiom, and $\\tau$ is its coding. The length of the shortest description of an L-system generating $w$ is known as $\\ell$, and is arguably a relevant measure of repetitiveness that builds on the self-similarities that arise in the sequence. In this paper we deepen the study of the measure $\\ell$ and its relation with $\\delta$, a better established lower bound that builds on substring complexity. Our results show that $\\ell$ and $\\delta$ are largely orthogonal, in the sense that one can be much larger than the other depending on the case. This suggests that both sources of repetitiveness are mostly unrelated. We also show that the recently introduced NU-systems, which combine the capabilities of L-systems with bidirectional macro-schemes, can be asymptotically strictly smaller than both mechanisms, which makes the size $\\nu$ of the smallest NU-system the unique smallest reachable repetitiveness measure to date.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.01688","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
An L-system (for lossless compression) is a CPD0L-system extended with two parameters $d$ and $n$, which determines unambiguously a string $w = \tau(\varphi^d(s))[1:n]$, where $\varphi$ is the morphism of the system, $s$ is its axiom, and $\tau$ is its coding. The length of the shortest description of an L-system generating $w$ is known as $\ell$, and is arguably a relevant measure of repetitiveness that builds on the self-similarities that arise in the sequence. In this paper we deepen the study of the measure $\ell$ and its relation with $\delta$, a better established lower bound that builds on substring complexity. Our results show that $\ell$ and $\delta$ are largely orthogonal, in the sense that one can be much larger than the other depending on the case. This suggests that both sources of repetitiveness are mostly unrelated. We also show that the recently introduced NU-systems, which combine the capabilities of L-systems with bidirectional macro-schemes, can be asymptotically strictly smaller than both mechanisms, which makes the size $\nu$ of the smallest NU-system the unique smallest reachable repetitiveness measure to date.