Role of Machine Learning in ETL Automation

Proceedings of the 21st International Conference on Distributed Computing and Networking Pub Date : 2020-01-04 DOI:10.1145/3369740.3372778

K. Mondal, Neepa Biswas, Swati Saha

{"title":"Role of Machine Learning in ETL Automation","authors":"K. Mondal, Neepa Biswas, Swati Saha","doi":"10.1145/3369740.3372778","DOIUrl":null,"url":null,"abstract":"In the current business landscape, real-time analysis of enterprise data is very crucial for decision-makers of the organization to take strategic resolution and stay ahead of the competitors. Most of the time, it happens that data is outdated by the time it reaches to the user. The organization needs reliable, up to minute information to make better proactive business decisions, improve the process and organizational efficiency. Availability of information and business-critical report at real-time can be achieved through an automated ETL process. Typically, running a data warehouse in an enterprise requires coordination of many operations across many teams including applications and database teams. Also, it required a lot of manual intervention, which is error-prone. Executing all related steps in correct sequences under accurate conditions can be a challenge. Automated ETL process helps to address all these problems. Moreover, the preprocessing of data is a crucial step for making data ready to load in a data warehouse for analysis. Machine learning-based preprocessing can be used to ensure the quality of data. In this paper, we have addressed the issues faced in traditional data warehouse related to availability as well as the quality of data. We have explained how to automate the ETL process and how machine learning can be leveraged in the ETL process so that the quality and availability of data does not ever have been compromised and reached to the user on a near real-time basis.","PeriodicalId":240048,"journal":{"name":"Proceedings of the 21st International Conference on Distributed Computing and Networking","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Conference on Distributed Computing and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3369740.3372778","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In the current business landscape, real-time analysis of enterprise data is very crucial for decision-makers of the organization to take strategic resolution and stay ahead of the competitors. Most of the time, it happens that data is outdated by the time it reaches to the user. The organization needs reliable, up to minute information to make better proactive business decisions, improve the process and organizational efficiency. Availability of information and business-critical report at real-time can be achieved through an automated ETL process. Typically, running a data warehouse in an enterprise requires coordination of many operations across many teams including applications and database teams. Also, it required a lot of manual intervention, which is error-prone. Executing all related steps in correct sequences under accurate conditions can be a challenge. Automated ETL process helps to address all these problems. Moreover, the preprocessing of data is a crucial step for making data ready to load in a data warehouse for analysis. Machine learning-based preprocessing can be used to ensure the quality of data. In this paper, we have addressed the issues faced in traditional data warehouse related to availability as well as the quality of data. We have explained how to automate the ETL process and how machine learning can be leveraged in the ETL process so that the quality and availability of data does not ever have been compromised and reached to the user on a near real-time basis.

查看原文本刊更多论文

机器学习在ETL自动化中的作用

在当前的商业环境中，企业数据的实时分析对于组织的决策者采取战略决议并保持领先于竞争对手至关重要。大多数情况下，数据到达用户手中时已经过时了。组织需要可靠的、最新的信息来做出更好的前瞻性业务决策，改进流程和组织效率。实时的信息可用性和业务关键型报告可以通过自动化的ETL过程来实现。通常，在企业中运行数据仓库需要跨多个团队(包括应用程序和数据库团队)协调许多操作。此外，它需要大量的人工干预，这很容易出错。在准确的条件下以正确的顺序执行所有相关步骤可能是一个挑战。自动化ETL过程有助于解决所有这些问题。此外，数据的预处理是使数据准备加载到数据仓库中进行分析的关键步骤。基于机器学习的预处理可以用来保证数据的质量。在本文中，我们讨论了传统数据仓库所面临的与可用性和数据质量相关的问题。我们已经解释了如何自动化ETL过程，以及如何在ETL过程中利用机器学习，以便数据的质量和可用性不会受到影响，并以接近实时的方式提供给用户。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21st International Conference on Distributed Computing and Networking

自引率

0.00%

发文量