A Data Reusing Strategy Based on Column-Stores

2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing Pub Date : 2013-12-21 DOI:10.1109/DASC.2013.56

Mei Wang, Jiaoling Zhou, Yue Li, Xiaoling Xia, Jiajin Le

引用次数: 0

Abstract

Data reusing is an important way to save storage capacity and improve query efficiency in the management of massive data. The column-store architecture stores data from the same column continuously, which greatly improves the performance of 'read optimization' application and moreover increases the feasibility and flexibility of data reusing. In this paper, we propose a novel reusing method based on the column-store data warehouse. Firstly, we propose an improved iMAP method based on the schema mapping technique to generate as more candidate reusable columns as possible and then conduct further filter on these candidate data, which greatly reduces the complexity of reusable data detection. Based on the column-store architecture, we then propose the reuse implement at the storage layer. The method for query execution based on reusable data is provided finally. The experiment results conducted on the real data sets indicate that the presented strategy can reduce the storage space and query execution time efficiently.

查看原文本刊更多论文

基于列存储的数据重用策略

在海量数据管理中，数据重用是节省存储容量、提高查询效率的重要手段。列存储结构将同一列的数据连续存储，大大提高了“读优化”应用程序的性能，增加了数据重用的可行性和灵活性。本文提出了一种基于列存储数据仓库的重用方法。首先，我们提出了一种基于模式映射技术的改进的iMAP方法，尽可能多地生成候选可重用列，然后对这些候选数据进行进一步的过滤，从而大大降低了可重用数据检测的复杂性。基于列存储体系结构，提出了存储层的重用实现。最后给出了基于可重用数据的查询执行方法。在实际数据集上的实验结果表明，该策略可以有效地减少存储空间和查询执行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing

自引率

0.00%

发文量