数据仓库暂存区的增强技术

Mahmoud El-Wessimy, Hoda M. O. Mokhtar, O. Hegazy
{"title":"数据仓库暂存区的增强技术","authors":"Mahmoud El-Wessimy, Hoda M. O. Mokhtar, O. Hegazy","doi":"10.5121/IJDKP.2013.3601","DOIUrl":null,"url":null,"abstract":"Poor performance can turn a successful data warehousing project into a failure. Consequently, several attempts have been made by various researchers to deal with the problem of scheduling the ExtractTransform-Load (ETL) process. In this paper we therefore present several approaches in the context of enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the performance of extract and transform phases by proposing two algorithms that reduce the time needed in each phase through employing the hidden semantic information in the data. Using the semantic information, a large volume of useless data can be pruned in early design stage. We also focus on the problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time. We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we experimentally show their behavior in terms of execution time in the sales domain to understand the impact of implementing any of them and choosing the one leading to maximum performance enhancement.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Enhancement Techniques for Data Warehouse Staging Area\",\"authors\":\"Mahmoud El-Wessimy, Hoda M. O. Mokhtar, O. Hegazy\",\"doi\":\"10.5121/IJDKP.2013.3601\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Poor performance can turn a successful data warehousing project into a failure. Consequently, several attempts have been made by various researchers to deal with the problem of scheduling the ExtractTransform-Load (ETL) process. In this paper we therefore present several approaches in the context of enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the performance of extract and transform phases by proposing two algorithms that reduce the time needed in each phase through employing the hidden semantic information in the data. Using the semantic information, a large volume of useless data can be pruned in early design stage. We also focus on the problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time. We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we experimentally show their behavior in terms of execution time in the sales domain to understand the impact of implementing any of them and choosing the one leading to maximum performance enhancement.\",\"PeriodicalId\":131153,\"journal\":{\"name\":\"International Journal of Data Mining & Knowledge Management Process\",\"volume\":\"125 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Mining & Knowledge Management Process\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5121/IJDKP.2013.3601\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining & Knowledge Management Process","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/IJDKP.2013.3601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

糟糕的性能会使一个成功的数据仓库项目失败。因此,不同的研究人员已经做了一些尝试来处理调度ExtractTransform-Load (ETL)过程的问题。因此,在本文中,我们提出了几种增强数据仓库提取、转换和加载阶段的方法。我们提出了两种算法,通过利用数据中隐藏的语义信息来减少每个阶段所需的时间,从而提高了提取和变换阶段的性能。利用语义信息,可以在设计初期就修剪掉大量无用的数据。我们还关注ETL活动执行的调度问题,目标是最小化ETL执行时间。我们通过为ETL选择三种调度技术来探索和投资这一领域。最后,我们通过实验展示了它们在销售领域的执行时间方面的行为,以了解实现它们中的任何一个并选择导致最大性能增强的一个的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancement Techniques for Data Warehouse Staging Area
Poor performance can turn a successful data warehousing project into a failure. Consequently, several attempts have been made by various researchers to deal with the problem of scheduling the ExtractTransform-Load (ETL) process. In this paper we therefore present several approaches in the context of enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the performance of extract and transform phases by proposing two algorithms that reduce the time needed in each phase through employing the hidden semantic information in the data. Using the semantic information, a large volume of useless data can be pruned in early design stage. We also focus on the problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time. We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we experimentally show their behavior in terms of execution time in the sales domain to understand the impact of implementing any of them and choosing the one leading to maximum performance enhancement.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信