{"title":"Role of Machine Learning in ETL Automation","authors":"K. Mondal, Neepa Biswas, Swati Saha","doi":"10.1145/3369740.3372778","DOIUrl":null,"url":null,"abstract":"In the current business landscape, real-time analysis of enterprise data is very crucial for decision-makers of the organization to take strategic resolution and stay ahead of the competitors. Most of the time, it happens that data is outdated by the time it reaches to the user. The organization needs reliable, up to minute information to make better proactive business decisions, improve the process and organizational efficiency. Availability of information and business-critical report at real-time can be achieved through an automated ETL process. Typically, running a data warehouse in an enterprise requires coordination of many operations across many teams including applications and database teams. Also, it required a lot of manual intervention, which is error-prone. Executing all related steps in correct sequences under accurate conditions can be a challenge. Automated ETL process helps to address all these problems. Moreover, the preprocessing of data is a crucial step for making data ready to load in a data warehouse for analysis. Machine learning-based preprocessing can be used to ensure the quality of data. In this paper, we have addressed the issues faced in traditional data warehouse related to availability as well as the quality of data. We have explained how to automate the ETL process and how machine learning can be leveraged in the ETL process so that the quality and availability of data does not ever have been compromised and reached to the user on a near real-time basis.","PeriodicalId":240048,"journal":{"name":"Proceedings of the 21st International Conference on Distributed Computing and Networking","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Conference on Distributed Computing and Networking","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3369740.3372778","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In the current business landscape, real-time analysis of enterprise data is very crucial for decision-makers of the organization to take strategic resolution and stay ahead of the competitors. Most of the time, it happens that data is outdated by the time it reaches to the user. The organization needs reliable, up to minute information to make better proactive business decisions, improve the process and organizational efficiency. Availability of information and business-critical report at real-time can be achieved through an automated ETL process. Typically, running a data warehouse in an enterprise requires coordination of many operations across many teams including applications and database teams. Also, it required a lot of manual intervention, which is error-prone. Executing all related steps in correct sequences under accurate conditions can be a challenge. Automated ETL process helps to address all these problems. Moreover, the preprocessing of data is a crucial step for making data ready to load in a data warehouse for analysis. Machine learning-based preprocessing can be used to ensure the quality of data. In this paper, we have addressed the issues faced in traditional data warehouse related to availability as well as the quality of data. We have explained how to automate the ETL process and how machine learning can be leveraged in the ETL process so that the quality and availability of data does not ever have been compromised and reached to the user on a near real-time basis.