{"title":"Identifying Hierarchical Structures in Sequences on GPU","authors":"P. Jalan, A. Jain, Subhajit Roy","doi":"10.1109/Trustcom.2015.609","DOIUrl":null,"url":null,"abstract":"Identifying hierarchical structures in sequences is an important problem with applications in lossless data-compression to program profiling. A popular algorithm for identifying hierarchical structures in sequences is the Sequitur algorithm developed by Nevill-Manning and Witten. Sequitur is not just a compression algorithm, it attempts to learn the hierarchical structure of the input sequence as a context-free grammar. However, Sequitur is difficult to parallelize. Inspired by Sequitur, we have developed a new GPU algorithm, that reveals the hierarchical structure in sequences and is also concurrency-friendly. Our algorithm, Pequitur, is built as a series of fast kernels (for intermittent synchronization), where each kernel attempts to minimize inter-thread communication and achieve a good load balance among the GPU threads. As opposed to Sequitur, Pequitur follows a greedy strategy to find good productions, that are productions formed by long and frequent substrings. We have implemented and evaluated our algorithm on the NVIDIA K20c card on random strings drawn from multiple distributions. On our benchmarks, Pequitur achieves an average speedup of more than 3X over an optimized Sequitur implementation with similar compression ratios.","PeriodicalId":277092,"journal":{"name":"2015 IEEE Trustcom/BigDataSE/ISPA","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Trustcom/BigDataSE/ISPA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Trustcom.2015.609","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Identifying hierarchical structures in sequences is an important problem with applications in lossless data-compression to program profiling. A popular algorithm for identifying hierarchical structures in sequences is the Sequitur algorithm developed by Nevill-Manning and Witten. Sequitur is not just a compression algorithm, it attempts to learn the hierarchical structure of the input sequence as a context-free grammar. However, Sequitur is difficult to parallelize. Inspired by Sequitur, we have developed a new GPU algorithm, that reveals the hierarchical structure in sequences and is also concurrency-friendly. Our algorithm, Pequitur, is built as a series of fast kernels (for intermittent synchronization), where each kernel attempts to minimize inter-thread communication and achieve a good load balance among the GPU threads. As opposed to Sequitur, Pequitur follows a greedy strategy to find good productions, that are productions formed by long and frequent substrings. We have implemented and evaluated our algorithm on the NVIDIA K20c card on random strings drawn from multiple distributions. On our benchmarks, Pequitur achieves an average speedup of more than 3X over an optimized Sequitur implementation with similar compression ratios.