{"title":"ETLator - a scripting ETL framework","authors":"Miran Radonic, I. Mekterović","doi":"10.23919/MIPRO.2017.7973632","DOIUrl":null,"url":null,"abstract":"ETL (Extract Transform Load) process is the industry standard term for data extraction, transformation and loading into the Data Warehouse (DW). ETL process is the most resource demanding process in DW implementation and typically has to be evolved and maintained for the duration of the DW. To facilitate the development and maintenance of ETL processes many ETL tools have been developed featuring Graphical User Interfaces and various built-in functionalities (parallelism, logging, rich transformation libraries, documentation generation, etc.). The downside of such GUI ETL tools is that development is carried out heavily using mouse operations and less by writing programming code, which feels unnatural for some developers, especially with many similar, repetitive tasks. In this paper we present an alternative approach - an ETL framework “ETLator” based on Python scripting language where ETL tasks are defined by writing Python code. ETLator implements various typical ETL transformations and allows the user to simply and efficiently define complex ETL tasks with multiple sources and parallel tasks whilst leveraging full flexibility of Python. ETLator also provides logging and can document ETL tasks by generating data flow images. On a test case we show that ETLator simplifies ETL development and rivals the GUI approach.","PeriodicalId":203046,"journal":{"name":"2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/MIPRO.2017.7973632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
ETL (Extract Transform Load) process is the industry standard term for data extraction, transformation and loading into the Data Warehouse (DW). ETL process is the most resource demanding process in DW implementation and typically has to be evolved and maintained for the duration of the DW. To facilitate the development and maintenance of ETL processes many ETL tools have been developed featuring Graphical User Interfaces and various built-in functionalities (parallelism, logging, rich transformation libraries, documentation generation, etc.). The downside of such GUI ETL tools is that development is carried out heavily using mouse operations and less by writing programming code, which feels unnatural for some developers, especially with many similar, repetitive tasks. In this paper we present an alternative approach - an ETL framework “ETLator” based on Python scripting language where ETL tasks are defined by writing Python code. ETLator implements various typical ETL transformations and allows the user to simply and efficiently define complex ETL tasks with multiple sources and parallel tasks whilst leveraging full flexibility of Python. ETLator also provides logging and can document ETL tasks by generating data flow images. On a test case we show that ETLator simplifies ETL development and rivals the GUI approach.