M. Naeem, G. Dobbie, Imran Sarwar Bajwa, Gerald Weber
{"title":"Resource optimization for processing of stream data in data warehouse environment","authors":"M. Naeem, G. Dobbie, Imran Sarwar Bajwa, Gerald Weber","doi":"10.1145/2345396.2345407","DOIUrl":null,"url":null,"abstract":"To fulfill the increasing demand of business for the latest information, current data integration approaches are moving towards real-time updates. In the case of real-time data integration the updates occurring on the source systems need to be reflected in the data warehouse immediately. One important element in real-time data integration is the join of a continuous incoming data stream with a disk-based master data. In this context a stream-based algorithm called X-HYBRIDJOIN (Extended Hybrid Join) has been proposed earlier, with a favorable asymptotic runtime behavior. However, the absolute performance was not as good as hoped for. In this paper we present results showing that through properly tuning the algorithm, the resulting Tuned X-HYBRIDJOIN performs significantly better than that of the previous X-HYBRIDJOIN, and better as other applicable join operators found in literature. We present the tuning approach, based on measurement techniques and a revised cost model. To evaluate the algorithm's performance we conduct an experimental study that shows that Tuned X-HYBRIDJOIN exhibits the desired performance characteristics.","PeriodicalId":290400,"journal":{"name":"International Conference on Advances in Computing, Communications and Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Advances in Computing, Communications and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2345396.2345407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
To fulfill the increasing demand of business for the latest information, current data integration approaches are moving towards real-time updates. In the case of real-time data integration the updates occurring on the source systems need to be reflected in the data warehouse immediately. One important element in real-time data integration is the join of a continuous incoming data stream with a disk-based master data. In this context a stream-based algorithm called X-HYBRIDJOIN (Extended Hybrid Join) has been proposed earlier, with a favorable asymptotic runtime behavior. However, the absolute performance was not as good as hoped for. In this paper we present results showing that through properly tuning the algorithm, the resulting Tuned X-HYBRIDJOIN performs significantly better than that of the previous X-HYBRIDJOIN, and better as other applicable join operators found in literature. We present the tuning approach, based on measurement techniques and a revised cost model. To evaluate the algorithm's performance we conduct an experimental study that shows that Tuned X-HYBRIDJOIN exhibits the desired performance characteristics.