{"title":"Research of Distributed Query and Optimization Method Based onMetadata","authors":"Huaiyuan Wang","doi":"10.2174/1874444301507011759","DOIUrl":null,"url":null,"abstract":"A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.","PeriodicalId":153592,"journal":{"name":"The Open Automation and Control Systems Journal","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Open Automation and Control Systems Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1874444301507011759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.