Research of Distributed Query and Optimization Method Based onMetadata

Huaiyuan Wang
{"title":"Research of Distributed Query and Optimization Method Based onMetadata","authors":"Huaiyuan Wang","doi":"10.2174/1874444301507011759","DOIUrl":null,"url":null,"abstract":"A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.","PeriodicalId":153592,"journal":{"name":"The Open Automation and Control Systems Journal","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Open Automation and Control Systems Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1874444301507011759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.
基于元数据的分布式查询与优化方法研究
本文研究了一种基于元数据的分布式查询方法,利用元数据对包含数据源关键信息的虚拟表进行定义和管理。然后,针对不同的数据级别,设计了基于查询和优化的两种不同的数据解决方案,分别适用于普通数据和海量数据。在常用数据查询中,采用虚拟表、语法分析树和内存数据库实现;从虚拟SQL查询语法树中抓取、移动和划分分支,以优化查询。在海量数据查询方面,采用Pig、Hadoop、Python实现数据查询;通过优化Pig代码,使用多个进程,在HDFS中处理文件合并和文件上传下载,对高频业务建立索引等,实现大数据的优化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信