{"title":"r-indexing the eBWT","authors":"Christina Boucher , Davide Cenzato , Zsuzsanna Lipták , Massimiliano Rossi , Marinella Sciortino","doi":"10.1016/j.ic.2024.105155","DOIUrl":null,"url":null,"abstract":"<div><p>The extended Burrows-Wheeler Transform (eBWT) [Mantaci et al. TCS 2007] is a variant of the BWT, introduced for collections of strings. In this paper, we present the <em>extended r-index</em>, an analogous data structure to the <em>r</em>-index [Gagie et al. JACM 2020]. It occupies <span><math><mi>O</mi><mo>(</mo><mi>r</mi><mo>)</mo></math></span> words, with <em>r</em> the number of runs of the eBWT, and offers the same functionalities as the <em>r</em>-index. We also show how to efficiently support finding maximal exact matches (MEMs). We implemented the extended <em>r</em>-index and tested it on circular bacterial genomes and plasmids, comparing it to five state-of-the-art compressed text indexes. While our data structure maintains similar time and memory requirements for answering pattern matching queries as the original <em>r</em>-index, it is the only index in the literature that can naturally be used for both circular and linear input collections. This is an extended version of [Boucher et al., <em>r-indexing the</em> eBWT, SPIRE 2021].</p></div>","PeriodicalId":54985,"journal":{"name":"Information and Computation","volume":"298 ","pages":"Article 105155"},"PeriodicalIF":0.8000,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0890540124000208/pdfft?md5=473e3c991b33dac12225d2a9ce2daad7&pid=1-s2.0-S0890540124000208-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0890540124000208","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The extended Burrows-Wheeler Transform (eBWT) [Mantaci et al. TCS 2007] is a variant of the BWT, introduced for collections of strings. In this paper, we present the extended r-index, an analogous data structure to the r-index [Gagie et al. JACM 2020]. It occupies words, with r the number of runs of the eBWT, and offers the same functionalities as the r-index. We also show how to efficiently support finding maximal exact matches (MEMs). We implemented the extended r-index and tested it on circular bacterial genomes and plasmids, comparing it to five state-of-the-art compressed text indexes. While our data structure maintains similar time and memory requirements for answering pattern matching queries as the original r-index, it is the only index in the literature that can naturally be used for both circular and linear input collections. This is an extended version of [Boucher et al., r-indexing the eBWT, SPIRE 2021].
期刊介绍:
Information and Computation welcomes original papers in all areas of theoretical computer science and computational applications of information theory. Survey articles of exceptional quality will also be considered. Particularly welcome are papers contributing new results in active theoretical areas such as
-Biological computation and computational biology-
Computational complexity-
Computer theorem-proving-
Concurrency and distributed process theory-
Cryptographic theory-
Data base theory-
Decision problems in logic-
Design and analysis of algorithms-
Discrete optimization and mathematical programming-
Inductive inference and learning theory-
Logic & constraint programming-
Program verification & model checking-
Probabilistic & Quantum computation-
Semantics of programming languages-
Symbolic computation, lambda calculus, and rewriting systems-
Types and typechecking