Role of Machine Learning in ETL Automation

K. Mondal, Neepa Biswas, Swati Saha
{"title":"Role of Machine Learning in ETL Automation","authors":"K. Mondal, Neepa Biswas, Swati Saha","doi":"10.1145/3369740.3372778","DOIUrl":null,"url":null,"abstract":"In the current business landscape, real-time analysis of enterprise data is very crucial for decision-makers of the organization to take strategic resolution and stay ahead of the competitors. Most of the time, it happens that data is outdated by the time it reaches to the user. The organization needs reliable, up to minute information to make better proactive business decisions, improve the process and organizational efficiency. Availability of information and business-critical report at real-time can be achieved through an automated ETL process. Typically, running a data warehouse in an enterprise requires coordination of many operations across many teams including applications and database teams. Also, it required a lot of manual intervention, which is error-prone. Executing all related steps in correct sequences under accurate conditions can be a challenge. Automated ETL process helps to address all these problems. Moreover, the preprocessing of data is a crucial step for making data ready to load in a data warehouse for analysis. Machine learning-based preprocessing can be used to ensure the quality of data. In this paper, we have addressed the issues faced in traditional data warehouse related to availability as well as the quality of data. We have explained how to automate the ETL process and how machine learning can be leveraged in the ETL process so that the quality and availability of data does not ever have been compromised and reached to the user on a near real-time basis.","PeriodicalId":240048,"journal":{"name":"Proceedings of the 21st International Conference on Distributed Computing and Networking","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Conference on Distributed Computing and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3369740.3372778","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

In the current business landscape, real-time analysis of enterprise data is very crucial for decision-makers of the organization to take strategic resolution and stay ahead of the competitors. Most of the time, it happens that data is outdated by the time it reaches to the user. The organization needs reliable, up to minute information to make better proactive business decisions, improve the process and organizational efficiency. Availability of information and business-critical report at real-time can be achieved through an automated ETL process. Typically, running a data warehouse in an enterprise requires coordination of many operations across many teams including applications and database teams. Also, it required a lot of manual intervention, which is error-prone. Executing all related steps in correct sequences under accurate conditions can be a challenge. Automated ETL process helps to address all these problems. Moreover, the preprocessing of data is a crucial step for making data ready to load in a data warehouse for analysis. Machine learning-based preprocessing can be used to ensure the quality of data. In this paper, we have addressed the issues faced in traditional data warehouse related to availability as well as the quality of data. We have explained how to automate the ETL process and how machine learning can be leveraged in the ETL process so that the quality and availability of data does not ever have been compromised and reached to the user on a near real-time basis.
机器学习在ETL自动化中的作用
在当前的商业环境中,企业数据的实时分析对于组织的决策者采取战略决议并保持领先于竞争对手至关重要。大多数情况下,数据到达用户手中时已经过时了。组织需要可靠的、最新的信息来做出更好的前瞻性业务决策,改进流程和组织效率。实时的信息可用性和业务关键型报告可以通过自动化的ETL过程来实现。通常,在企业中运行数据仓库需要跨多个团队(包括应用程序和数据库团队)协调许多操作。此外,它需要大量的人工干预,这很容易出错。在准确的条件下以正确的顺序执行所有相关步骤可能是一个挑战。自动化ETL过程有助于解决所有这些问题。此外,数据的预处理是使数据准备加载到数据仓库中进行分析的关键步骤。基于机器学习的预处理可以用来保证数据的质量。在本文中,我们讨论了传统数据仓库所面临的与可用性和数据质量相关的问题。我们已经解释了如何自动化ETL过程,以及如何在ETL过程中利用机器学习,以便数据的质量和可用性不会受到影响,并以接近实时的方式提供给用户。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信