{"title":"基于元数据的分布式查询与优化方法研究","authors":"Huaiyuan Wang","doi":"10.2174/1874444301507011759","DOIUrl":null,"url":null,"abstract":"A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.","PeriodicalId":153592,"journal":{"name":"The Open Automation and Control Systems Journal","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research of Distributed Query and Optimization Method Based onMetadata\",\"authors\":\"Huaiyuan Wang\",\"doi\":\"10.2174/1874444301507011759\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.\",\"PeriodicalId\":153592,\"journal\":{\"name\":\"The Open Automation and Control Systems Journal\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Open Automation and Control Systems Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2174/1874444301507011759\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Open Automation and Control Systems Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1874444301507011759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research of Distributed Query and Optimization Method Based onMetadata
A method of distributed query based on metadata, which uses metadata to define and manage the virtual table containing key information of the data source, has been studied in this paper. Then, in view of the different data level, de- signed two different data solutions based on query and optimization, for applying to common data and huge data respec- tively. In common data query, using the virtual table, the syntax analysis tree and memory database was realized by; cop- ying, moving, and dividing the branch from virtual SQL query syntax tree to make the query optimized. In terms of huge amounts of data query, Pig, Hadoop, Python is used to implement data query; by optimizing the Pig code, using multiple processes, processing file merging and file uploading or downloading in HDFS, making index on high frequency business and so on to achieve optimization of big data.