PStore: an efficient storage framework for managing scientific data

Souvik Bhattacherjee, A. Deshpande, A. Sussman
{"title":"PStore: an efficient storage framework for managing scientific data","authors":"Souvik Bhattacherjee, A. Deshpande, A. Sussman","doi":"10.1145/2618243.2618268","DOIUrl":null,"url":null,"abstract":"In this paper, we present the design, implementation, and evaluation of PStore, a no-overwrite storage framework for managing large volumes of array data generated by scientific simulations. PStore consists of two modules, a data ingestion module and a query processing module, that respectively address two of the key challenges in scientific simulation data management. The data ingestion module is geared toward handling the high volumes of simulation data generated at a very rapid rate, which often makes it impossible to offload the data onto storage devices; the module is responsible for selecting an appropriate compression scheme for the data at hand, chunking the data, and then compressing it before sending it to the storage nodes. On the other hand, the query processing module is in charge of efficiently executing different types of queries over the stored data; in this paper, we specifically focus on dicing (also called range) queries. PStore provides a suite of compression schemes that leverage, and in some cases extend, existing techniques to provide support for diverse scientific simulation data. To efficiently execute queries over such compressed data, PStore adopts and extends a two-level chunking scheme by incorporating the effect of compression, and hides expensive disk latencies for long running range queries by exploiting chunk prefetching. In addition, we also parallelize the query processing module to further speed up execution. We evaluate PStore on a 140 GB dataset obtained from real-world simulations using the regional climate model CWRF [5]. In this paper, we use both 3D and 4D datasets and demonstrate high performance through extensive experiments.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"25:1-25:12"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2618243.2618268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

In this paper, we present the design, implementation, and evaluation of PStore, a no-overwrite storage framework for managing large volumes of array data generated by scientific simulations. PStore consists of two modules, a data ingestion module and a query processing module, that respectively address two of the key challenges in scientific simulation data management. The data ingestion module is geared toward handling the high volumes of simulation data generated at a very rapid rate, which often makes it impossible to offload the data onto storage devices; the module is responsible for selecting an appropriate compression scheme for the data at hand, chunking the data, and then compressing it before sending it to the storage nodes. On the other hand, the query processing module is in charge of efficiently executing different types of queries over the stored data; in this paper, we specifically focus on dicing (also called range) queries. PStore provides a suite of compression schemes that leverage, and in some cases extend, existing techniques to provide support for diverse scientific simulation data. To efficiently execute queries over such compressed data, PStore adopts and extends a two-level chunking scheme by incorporating the effect of compression, and hides expensive disk latencies for long running range queries by exploiting chunk prefetching. In addition, we also parallelize the query processing module to further speed up execution. We evaluate PStore on a 140 GB dataset obtained from real-world simulations using the regional climate model CWRF [5]. In this paper, we use both 3D and 4D datasets and demonstrate high performance through extensive experiments.
PStore:用于管理科学数据的高效存储框架
在本文中,我们介绍了PStore的设计,实现和评估,PStore是一个用于管理由科学模拟生成的大量阵列数据的无覆盖存储框架。PStore包括两个模块,一个数据摄取模块和一个查询处理模块,分别解决了科学仿真数据管理中的两个关键挑战。数据摄取模块旨在处理以非常快的速度生成的大量模拟数据,这通常使数据无法卸载到存储设备上;该模块负责为手头的数据选择适当的压缩方案,将数据分块,然后在将其发送到存储节点之前对其进行压缩。另一方面,查询处理模块负责对存储的数据有效地执行不同类型的查询;在本文中,我们特别关注dicding(也称为range)查询。PStore提供了一套压缩方案,这些方案利用(在某些情况下扩展)现有技术,为各种科学模拟数据提供支持。为了有效地执行对这些压缩数据的查询,PStore采用并扩展了两级分块方案,结合了压缩的效果,并通过利用块预取来隐藏长运行范围查询的昂贵的磁盘延迟。此外,我们还将查询处理模块并行化,以进一步加快执行速度。我们使用区域气候模式CWRF[5]在真实世界模拟中获得的140gb数据集上评估了PStore。在本文中,我们使用了3D和4D数据集,并通过大量的实验证明了高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信