{"title":"A PaaS based metadata-driven ETL framework","authors":"Liutong Xu, Jia Liao, Ruixue Zhao, Bin Wu","doi":"10.1109/CCIS.2011.6045113","DOIUrl":null,"url":null,"abstract":"Knowledge discovery has often used as a background application to motivate many technical problems in ETL research. However, traditional ETL tools face new challenges include tremendous amount of data and limitation of computing ability and so on. Meanwhile, MapReduce parallel computing model has been widely used in recent years. In This paper, we first analyze the problems of existing ETL tools and propose a metadata-driven ETL service model, and then summarize the types of metadata and their application scopes. Based on this metadata-driven ETL service model, we put forward a concrete ETL framework combined ETL with MapReduce algorithm framework and provided as PaaS to meet the requirements. Afterwards, many significant services are also discussed. At last, we illustrate some strategies for advancing the flexibility, extensibility of the framework and promote the reusability of ETL components and ETL application. In conclusion, practices have proved that the model and the framework proposed in this paper have advantages that open-source or commercial ETL tools do not have and can deal the problem of processing large scale data.","PeriodicalId":128504,"journal":{"name":"2011 IEEE International Conference on Cloud Computing and Intelligence Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cloud Computing and Intelligence Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS.2011.6045113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Knowledge discovery has often used as a background application to motivate many technical problems in ETL research. However, traditional ETL tools face new challenges include tremendous amount of data and limitation of computing ability and so on. Meanwhile, MapReduce parallel computing model has been widely used in recent years. In This paper, we first analyze the problems of existing ETL tools and propose a metadata-driven ETL service model, and then summarize the types of metadata and their application scopes. Based on this metadata-driven ETL service model, we put forward a concrete ETL framework combined ETL with MapReduce algorithm framework and provided as PaaS to meet the requirements. Afterwards, many significant services are also discussed. At last, we illustrate some strategies for advancing the flexibility, extensibility of the framework and promote the reusability of ETL components and ETL application. In conclusion, practices have proved that the model and the framework proposed in this paper have advantages that open-source or commercial ETL tools do not have and can deal the problem of processing large scale data.