Online LZ77 Parsing and Matching Statistics with RLBWTs

H. Bannai, T. Gagie, I. Tomohiro
{"title":"Online LZ77 Parsing and Matching Statistics with RLBWTs","authors":"H. Bannai, T. Gagie, I. Tomohiro","doi":"10.4230/LIPIcs.CPM.2018.7","DOIUrl":null,"url":null,"abstract":"Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse $T^R$ of a text $T$, to compute offline the LZ77 parse of $T$ in $O (n \\log r)$ time and $O (r)$ space, where $n$ is the length of $T$ and $r$ is the number of runs in the BWT of $T^R$. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of $T$ while still using $O (n \\log r)$ time and $O (r)$ space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further --- albeit making it static again and increasing its space by a factor proportional to the size of the alphabet --- such that later, given another string $S$ and $O (\\log \\log n)$-time random access to $T$, we can compute the matching statistics of $S$ with respect to $T$ in $O (|S| \\log \\log n)$ time.","PeriodicalId":236737,"journal":{"name":"Annual Symposium on Combinatorial Pattern Matching","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Symposium on Combinatorial Pattern Matching","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.CPM.2018.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse $T^R$ of a text $T$, to compute offline the LZ77 parse of $T$ in $O (n \log r)$ time and $O (r)$ space, where $n$ is the length of $T$ and $r$ is the number of runs in the BWT of $T^R$. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of $T$ while still using $O (n \log r)$ time and $O (r)$ space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further --- albeit making it static again and increasing its space by a factor proportional to the size of the alphabet --- such that later, given another string $S$ and $O (\log \log n)$-time random access to $T$, we can compute the matching statistics of $S$ with respect to $T$ in $O (|S| \log \log n)$ time.
基于RLBWTs的在线LZ77解析和匹配统计
解析、匹配统计和Burrows-Wheeler变换(BWT)都是弦学的基本元素。在最近的一系列论文中,politici和Prezza (DCC 2016和Algorithmica, CPM 2017)展示了我们如何使用文本$T$的反向$T^R$的增强运行长度压缩BWT (RLBWT)来离线计算$T$在$O (n \log R)$时间和$O (R)$空间中的LZ77解析,其中$n$是$T$的长度,$ R$是$T^R$的BWT中的运行次数。在本文中,我们首先扩展了一种众所周知的技术,用于在字符被添加到文本中时更新未增强的RLBWT,以与Policriti和Prezza的增强RLBWT一起工作。这立即意味着我们可以在线构建$T$的LZ77解析,同时仍然使用$O (n \log r)$时间和$O (r)$空间;它似乎也可能具有独立的利益。我们的实验,使用Ohno, Takabatake, I和Sakamoto (IWOCA 2017)的更新实现的扩展,表明我们的方法对于重复字符串既节省时间又节省空间。然后,我们将展示如何进一步扩大RLBWT——尽管使其再次保持静态,并按与字母表大小成比例的因子增加其空间——这样,稍后,给定另一个字符串$S$和$O (\log \log n)$时间随机访问$T$,我们可以在$O (|S| \log \log n)$时间内计算$S$相对于$T$的匹配统计信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信