缓存友好的陋居-惠勒反转

2011 First International Conference on Data Compression, Communications and Processing Pub Date : 2011-06-21 DOI:10.1109/CCP.2011.15

Juha Kärkkäinen, S. Puglisi

{"title":"缓存友好的陋居-惠勒反转","authors":"Juha Kärkkäinen, S. Puglisi","doi":"10.1109/CCP.2011.15","DOIUrl":null,"url":null,"abstract":"The Burrows-Wheeler transform permutes the symbols of a string such that the permuted string can be compressed effectively with fast, simple techniques. Inversion of the transform is a bottleneck in practice. Inversion takes linear time, but, for each symbol decoded, folklore says that a random access into the transformed string (and so a CPU cache-miss) is necessary. In this paper we show how to mitigate cache misses and so speed inversion. Our main idea is to modify the standard inversion algorithm to detect and record repeated sub strings in the original string as it is recovered. Subsequent occurrences of these repetitions are then copied in a cache friendly way from the already recovered portion of the string, short cutting a series of random accesses by the standard inversion algorithm. We show experimentally that this approach leads to faster runtimes in general, and can drastically reduce inversion time for highly repetitive data.","PeriodicalId":167131,"journal":{"name":"2011 First International Conference on Data Compression, Communications and Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Cache Friendly Burrows-Wheeler Inversion\",\"authors\":\"Juha Kärkkäinen, S. Puglisi\",\"doi\":\"10.1109/CCP.2011.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Burrows-Wheeler transform permutes the symbols of a string such that the permuted string can be compressed effectively with fast, simple techniques. Inversion of the transform is a bottleneck in practice. Inversion takes linear time, but, for each symbol decoded, folklore says that a random access into the transformed string (and so a CPU cache-miss) is necessary. In this paper we show how to mitigate cache misses and so speed inversion. Our main idea is to modify the standard inversion algorithm to detect and record repeated sub strings in the original string as it is recovered. Subsequent occurrences of these repetitions are then copied in a cache friendly way from the already recovered portion of the string, short cutting a series of random accesses by the standard inversion algorithm. We show experimentally that this approach leads to faster runtimes in general, and can drastically reduce inversion time for highly repetitive data.\",\"PeriodicalId\":167131,\"journal\":{\"name\":\"2011 First International Conference on Data Compression, Communications and Processing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 First International Conference on Data Compression, Communications and Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCP.2011.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 First International Conference on Data Compression, Communications and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCP.2011.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

Burrows-Wheeler变换对字符串的符号进行排列，这样排列后的字符串可以用快速、简单的技术有效地压缩。变换的反演是实际应用中的瓶颈。反转需要线性时间，但是，对于解码的每个符号，一般认为随机访问转换后的字符串(因此CPU缓存丢失)是必要的。在本文中，我们展示了如何减少缓存丢失，从而加快反转。我们的主要想法是修改标准反转算法，以便在原始字符串恢复时检测并记录重复的子字符串。然后，这些重复的后续出现以缓存友好的方式从已经恢复的字符串部分复制，通过标准反转算法缩短了一系列随机访问。我们通过实验证明，这种方法通常会导致更快的运行时间，并且可以大大减少高度重复数据的反演时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cache Friendly Burrows-Wheeler Inversion

The Burrows-Wheeler transform permutes the symbols of a string such that the permuted string can be compressed effectively with fast, simple techniques. Inversion of the transform is a bottleneck in practice. Inversion takes linear time, but, for each symbol decoded, folklore says that a random access into the transformed string (and so a CPU cache-miss) is necessary. In this paper we show how to mitigate cache misses and so speed inversion. Our main idea is to modify the standard inversion algorithm to detect and record repeated sub strings in the original string as it is recovered. Subsequent occurrences of these repetitions are then copied in a cache friendly way from the already recovered portion of the string, short cutting a series of random accesses by the standard inversion algorithm. We show experimentally that this approach leads to faster runtimes in general, and can drastically reduce inversion time for highly repetitive data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 First International Conference on Data Compression, Communications and Processing

自引率

0.00%

发文量