基于元数据的分布式查询与优化方法研究

The Open Automation and Control Systems Journal Pub Date : 2015-10-20 DOI:10.2174/1874444301507011759

Huaiyuan Wang

{"title":"基于元数据的分布式查询与优化方法研究","authors":"Huaiyuan Wang","doi":"10.2174/1874444301507011759","DOIUrl":null,"url":null,"abstract":"A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.","PeriodicalId":153592,"journal":{"name":"The Open Automation and Control Systems Journal","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research of Distributed Query and Optimization Method Based onMetadata\",\"authors\":\"Huaiyuan Wang\",\"doi\":\"10.2174/1874444301507011759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.\",\"PeriodicalId\":153592,\"journal\":{\"name\":\"The Open Automation and Control Systems Journal\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Open Automation and Control Systems Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/1874444301507011759\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Open Automation and Control Systems Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1874444301507011759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文研究了一种基于元数据的分布式查询方法，利用元数据对包含数据源关键信息的虚拟表进行定义和管理。然后，针对不同的数据级别，设计了基于查询和优化的两种不同的数据解决方案，分别适用于普通数据和海量数据。在常用数据查询中，采用虚拟表、语法分析树和内存数据库实现;从虚拟SQL查询语法树中抓取、移动和划分分支，以优化查询。在海量数据查询方面，采用Pig、Hadoop、Python实现数据查询;通过优化Pig代码，使用多个进程，在HDFS中处理文件合并和文件上传下载，对高频业务建立索引等，实现大数据的优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research of Distributed Query and Optimization Method Based onMetadata

A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Open Automation and Control Systems Journal

自引率

0.00%

发文量