Research of Distributed Query and Optimization Method Based onMetadata

The Open Automation and Control Systems Journal Pub Date : 2015-10-20 DOI:10.2174/1874444301507011759

Huaiyuan Wang

引用次数: 0

Abstract

A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.

查看原文本刊更多论文

基于元数据的分布式查询与优化方法研究

本文研究了一种基于元数据的分布式查询方法，利用元数据对包含数据源关键信息的虚拟表进行定义和管理。然后，针对不同的数据级别，设计了基于查询和优化的两种不同的数据解决方案，分别适用于普通数据和海量数据。在常用数据查询中，采用虚拟表、语法分析树和内存数据库实现;从虚拟SQL查询语法树中抓取、移动和划分分支，以优化查询。在海量数据查询方面，采用Pig、Hadoop、Python实现数据查询;通过优化Pig代码，使用多个进程，在HDFS中处理文件合并和文件上传下载，对高频业务建立索引等，实现大数据的优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Open Automation and Control Systems Journal

自引率

0.00%

发文量