基于PIQUE的超二元索引与广义索引查询

David A. Boyuka, Houjun Tang, Kushal Bansal, Xiaocheng Zou, S. Klasky, N. Samatova
{"title":"基于PIQUE的超二元索引与广义索引查询","authors":"David A. Boyuka, Houjun Tang, Kushal Bansal, Xiaocheng Zou, S. Klasky, N. Samatova","doi":"10.1145/2791347.2791374","DOIUrl":null,"url":null,"abstract":"Many scientists rely on indexing and query to identify trends and anomalies within extreme-scale scientific data. Compressed bitmap indexing (e.g., FastBit) is the go-to indexing method for many scientific datasets and query workloads. Recently, the ALACRITY compressed inverted index was shown as a viable alternative approach. Notably, though FastBit and ALACRITY employ very different data structures (inverted list vs. bitmap) and binning methods (bit-wise vs. decimal-precision), close examination reveals marked similarities in index structure. Motivated by this observation, we ask two questions. First, \"Can we generalize FastBit and ALACRITY to an index model encompassing both?\" And second, if so, \"Can such a generalized framework enable other, new indexing methods?\" This paper answers both questions in the affrmative. First, we present PIQUE, a Parallel Indexing and Query Unified Engine, based on formal mathematical decomposition of the indexing process. PIQUE factors out commonalities in indexing, employing algorithmic/data structure \"plugins\" to mix orthogonal indexing concepts such as FastBit compressed bitmaps with ALACRITY binning, all within one framework. Second, we define the hyperdyadic tree index, distinct from both bitmap and inverted indexes, demonstrating good index compression while maintaining high query performance. We implement the hyperdyadic tree index within PIQUE, reinforcing our unified indexing model. We conduct a performance study of the hyperdyadic tree index vs. WAH compressed bitmaps, both within PIQUE and compared to FastBit, a state-of-the-art bitmap index system. The hyperdyadic tree index shows a 1.14-1.90x storage reduction vs. compressed bitmaps, with comparable or better query performance under most scenarios tested.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"The hyperdyadic index and generalized indexing and query with PIQUE\",\"authors\":\"David A. Boyuka, Houjun Tang, Kushal Bansal, Xiaocheng Zou, S. Klasky, N. Samatova\",\"doi\":\"10.1145/2791347.2791374\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many scientists rely on indexing and query to identify trends and anomalies within extreme-scale scientific data. Compressed bitmap indexing (e.g., FastBit) is the go-to indexing method for many scientific datasets and query workloads. Recently, the ALACRITY compressed inverted index was shown as a viable alternative approach. Notably, though FastBit and ALACRITY employ very different data structures (inverted list vs. bitmap) and binning methods (bit-wise vs. decimal-precision), close examination reveals marked similarities in index structure. Motivated by this observation, we ask two questions. First, \\\"Can we generalize FastBit and ALACRITY to an index model encompassing both?\\\" And second, if so, \\\"Can such a generalized framework enable other, new indexing methods?\\\" This paper answers both questions in the affrmative. First, we present PIQUE, a Parallel Indexing and Query Unified Engine, based on formal mathematical decomposition of the indexing process. PIQUE factors out commonalities in indexing, employing algorithmic/data structure \\\"plugins\\\" to mix orthogonal indexing concepts such as FastBit compressed bitmaps with ALACRITY binning, all within one framework. Second, we define the hyperdyadic tree index, distinct from both bitmap and inverted indexes, demonstrating good index compression while maintaining high query performance. We implement the hyperdyadic tree index within PIQUE, reinforcing our unified indexing model. We conduct a performance study of the hyperdyadic tree index vs. WAH compressed bitmaps, both within PIQUE and compared to FastBit, a state-of-the-art bitmap index system. The hyperdyadic tree index shows a 1.14-1.90x storage reduction vs. compressed bitmaps, with comparable or better query performance under most scenarios tested.\",\"PeriodicalId\":225179,\"journal\":{\"name\":\"Proceedings of the 27th International Conference on Scientific and Statistical Database Management\",\"volume\":\"171 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 27th International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2791347.2791374\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2791347.2791374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

许多科学家依靠索引和查询来识别极端尺度科学数据中的趋势和异常。压缩位图索引(例如,FastBit)是许多科学数据集和查询工作负载的首选索引方法。最近,ALACRITY压缩倒排索引被证明是一种可行的替代方法。值得注意的是,尽管FastBit和ALACRITY使用了非常不同的数据结构(倒排表vs位图)和分组方法(位精度vs小数精度),但仔细检查会发现索引结构有明显的相似之处。基于这一观察,我们提出了两个问题。首先,“我们能否将FastBit和ALACRITY概括为包含两者的索引模型?”其次,如果是这样,“这样一个一般化的框架能支持其他新的索引方法吗?”本文对这两个问题都作了肯定的回答。首先,我们提出了基于索引过程形式化数学分解的并行索引和查询统一引擎PIQUE。PIQUE排除了索引中的共性,使用算法/数据结构“插件”来混合正交索引概念,如FastBit压缩位图和ALACRITY分组,所有这些都在一个框架内。其次,我们定义了不同于位图索引和倒排索引的超二进树索引,在保持高查询性能的同时展示了良好的索引压缩。我们在PIQUE中实现了超二进树索引,加强了我们统一的索引模型。我们对超二进树索引与WAH压缩位图进行了性能研究,包括在PIQUE中以及与FastBit(一种最先进的位图索引系统)进行比较。与压缩位图相比,超二进树索引显示了1.14-1.90倍的存储减少,在大多数测试场景下具有相当或更好的查询性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The hyperdyadic index and generalized indexing and query with PIQUE
Many scientists rely on indexing and query to identify trends and anomalies within extreme-scale scientific data. Compressed bitmap indexing (e.g., FastBit) is the go-to indexing method for many scientific datasets and query workloads. Recently, the ALACRITY compressed inverted index was shown as a viable alternative approach. Notably, though FastBit and ALACRITY employ very different data structures (inverted list vs. bitmap) and binning methods (bit-wise vs. decimal-precision), close examination reveals marked similarities in index structure. Motivated by this observation, we ask two questions. First, "Can we generalize FastBit and ALACRITY to an index model encompassing both?" And second, if so, "Can such a generalized framework enable other, new indexing methods?" This paper answers both questions in the affrmative. First, we present PIQUE, a Parallel Indexing and Query Unified Engine, based on formal mathematical decomposition of the indexing process. PIQUE factors out commonalities in indexing, employing algorithmic/data structure "plugins" to mix orthogonal indexing concepts such as FastBit compressed bitmaps with ALACRITY binning, all within one framework. Second, we define the hyperdyadic tree index, distinct from both bitmap and inverted indexes, demonstrating good index compression while maintaining high query performance. We implement the hyperdyadic tree index within PIQUE, reinforcing our unified indexing model. We conduct a performance study of the hyperdyadic tree index vs. WAH compressed bitmaps, both within PIQUE and compared to FastBit, a state-of-the-art bitmap index system. The hyperdyadic tree index shows a 1.14-1.90x storage reduction vs. compressed bitmaps, with comparable or better query performance under most scenarios tested.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信