{"title":"基于数据分区的智能ETL工作流框架","authors":"Yingying Tu, Chaozhen Guo","doi":"10.1109/ICICISYS.2010.5658640","DOIUrl":null,"url":null,"abstract":"ETL tool is an important part to build a data warehouse and data centers. For massive data processing, this paper presents an intelligent ETL workflow framework based on the distributed computing servers, adding an intelligent manipulative module, acquiring the data of the system efficiency and resources, operating data, dynamically adjusting the ETL strategy, and doing corresponding data segmentation for larger jobs, realizing workflow optimization for multi-machine parallel execution, improving operational efficiency, and facilitating error recovery. Intelligent control module is composed of the monitor, knowledge base, and the selector. The source data horizontal partition is the basis and difficulty to achieve multi-machine parallel.","PeriodicalId":339711,"journal":{"name":"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"An intelligent ETL workflow framework based on data partition\",\"authors\":\"Yingying Tu, Chaozhen Guo\",\"doi\":\"10.1109/ICICISYS.2010.5658640\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ETL tool is an important part to build a data warehouse and data centers. For massive data processing, this paper presents an intelligent ETL workflow framework based on the distributed computing servers, adding an intelligent manipulative module, acquiring the data of the system efficiency and resources, operating data, dynamically adjusting the ETL strategy, and doing corresponding data segmentation for larger jobs, realizing workflow optimization for multi-machine parallel execution, improving operational efficiency, and facilitating error recovery. Intelligent control module is composed of the monitor, knowledge base, and the selector. The source data horizontal partition is the basis and difficulty to achieve multi-machine parallel.\",\"PeriodicalId\":339711,\"journal\":{\"name\":\"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems\",\"volume\":\"69 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICISYS.2010.5658640\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Intelligent Computing and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICISYS.2010.5658640","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An intelligent ETL workflow framework based on data partition
ETL tool is an important part to build a data warehouse and data centers. For massive data processing, this paper presents an intelligent ETL workflow framework based on the distributed computing servers, adding an intelligent manipulative module, acquiring the data of the system efficiency and resources, operating data, dynamically adjusting the ETL strategy, and doing corresponding data segmentation for larger jobs, realizing workflow optimization for multi-machine parallel execution, improving operational efficiency, and facilitating error recovery. Intelligent control module is composed of the monitor, knowledge base, and the selector. The source data horizontal partition is the basis and difficulty to achieve multi-machine parallel.