{"title":"Performance Tuning in Distributed Processing of ETL","authors":"Ping Yang, Zaiying Liu, Jun Ni","doi":"10.1109/ICICSE.2013.24","DOIUrl":null,"url":null,"abstract":"Extract, transform, and load (ETL) is a very common and important technology for building data warehouse includes business intelligence. When people issue a very complex SQL query to acquit data from a transaction system into a data warehouse, it involves many procedures including table-joining, sort, and aggregation. Such procedures require significant retrieving step and huge data transferring from tables. The intensive querying very often causes performance issues to be concerned. Moreover, it commonly generates negative impacts on data instance resources. How to improve the performance for ETL becomes critical and challenging. This paper presents a parallel processing solution that splitting big and complex SQL query into small pieces in distributed computing manor. The proposed method aims at reducing cost of computation, while ensuring data integrity among joined tables. The innovative idea can be verified through selected test-beds of performance tuning.","PeriodicalId":111647,"journal":{"name":"2013 Seventh International Conference on Internet Computing for Engineering and Science","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Seventh International Conference on Internet Computing for Engineering and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICSE.2013.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Extract, transform, and load (ETL) is a very common and important technology for building data warehouse includes business intelligence. When people issue a very complex SQL query to acquit data from a transaction system into a data warehouse, it involves many procedures including table-joining, sort, and aggregation. Such procedures require significant retrieving step and huge data transferring from tables. The intensive querying very often causes performance issues to be concerned. Moreover, it commonly generates negative impacts on data instance resources. How to improve the performance for ETL becomes critical and challenging. This paper presents a parallel processing solution that splitting big and complex SQL query into small pieces in distributed computing manor. The proposed method aims at reducing cost of computation, while ensuring data integrity among joined tables. The innovative idea can be verified through selected test-beds of performance tuning.