Space-Efficient Computation of the Burrows-Wheeler Transform

José Fuentes-Sepúlveda, G. Navarro, Yakov Nekrich
{"title":"Space-Efficient Computation of the Burrows-Wheeler Transform","authors":"José Fuentes-Sepúlveda, G. Navarro, Yakov Nekrich","doi":"10.1109/DCC.2019.00021","DOIUrl":null,"url":null,"abstract":"The Burrows-Wheeler Transform (BWT) has become an essential tool for compressed text indexing. Computing it efficiently and within little space is essential for the practicality of the indexes that build on it. A recent algorithm (Munro, Navarro & Nekrich, SODA 2017) computes the BWT in O(n) time using O(nlgσ) bits of space for a text of length n over an alphabet of size σ. The result is of theoretical nature and its practicality is far from obvious. In this paper we engineer their solution and show that, while a basic implementation is slow in practice, the algorithm is amenable to parallelization. For a wide range of alphabet sizes, our resulting implementation outperforms all the compact constructions in the space/time tradeoff map. On the smallest alphabets we are outperformed in time, but nevertheless achieve the least space within reasonable time. For example, in DNA sequences, the most widely used application of BWTs, our construction uses 4.84 bits per base and builds the BWT at a rate of 2.13 megabases per second, whereas the closest previous alternative uses around 7.09 bits per base and runs at 4.17 megabases per second.","PeriodicalId":167723,"journal":{"name":"2019 Data Compression Conference (DCC)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Data Compression Conference (DCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCC.2019.00021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The Burrows-Wheeler Transform (BWT) has become an essential tool for compressed text indexing. Computing it efficiently and within little space is essential for the practicality of the indexes that build on it. A recent algorithm (Munro, Navarro & Nekrich, SODA 2017) computes the BWT in O(n) time using O(nlgσ) bits of space for a text of length n over an alphabet of size σ. The result is of theoretical nature and its practicality is far from obvious. In this paper we engineer their solution and show that, while a basic implementation is slow in practice, the algorithm is amenable to parallelization. For a wide range of alphabet sizes, our resulting implementation outperforms all the compact constructions in the space/time tradeoff map. On the smallest alphabets we are outperformed in time, but nevertheless achieve the least space within reasonable time. For example, in DNA sequences, the most widely used application of BWTs, our construction uses 4.84 bits per base and builds the BWT at a rate of 2.13 megabases per second, whereas the closest previous alternative uses around 7.09 bits per base and runs at 4.17 megabases per second.
Burrows-Wheeler变换的空间高效计算
Burrows-Wheeler变换(BWT)已成为压缩文本索引的重要工具。在很小的空间内高效地计算它对于建立在它之上的索引的实用性至关重要。最近的一种算法(Munro, Navarro & Nekrich, SODA 2017)使用O(nlgσ)位空间在O(n)时间内计算长度为n的文本在大小为σ的字母表上的BWT。结果是理论性的,实用性还不明显。在本文中,我们设计了他们的解决方案,并表明,虽然一个基本的实现在实践中很慢,但该算法是适合并行化的。对于大范围的字母大小,我们的最终实现优于空间/时间权衡图中的所有紧凑结构。在最小的字母上,我们在时间上表现得更好,但在合理的时间内获得了最小的空间。例如,在DNA序列(BWT最广泛使用的应用)中,我们的构建使用每个碱基4.84比特,以每秒2.13兆碱基的速率构建BWT,而之前最接近的替代方法使用每个碱基7.09比特,以每秒4.17兆碱基的速率运行。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信