{"title":"Data integration from traditional to big data: main features and comparisons of ETL approaches","authors":"Afef Walha, Faiza Ghozzi, Faiez Gargouri","doi":"10.1007/s11227-024-06413-1","DOIUrl":null,"url":null,"abstract":"<p>Data integration combines information from different sources to provide a comprehensive view for making informed business decisions. The ETL (Extract, Transform, and Load) process is essential in data integration. In the past two decades, modeling the ETL process has become a priority for effectively managing information. This paper aims to explore ETL approaches to help researchers and organizational stakeholders overcome challenges, especially in Big Data integration. It offers a comprehensive overview of ETL methods, from traditional to Big Data, and discusses their advantages, limitations, and the primary trends in Big Data integration. The study emphasizes that many technologies have been integrated into ETL steps for data collection, storage, processing, querying, and analysis without proper modeling. Therefore, more generic and customized design modeling of the ETL steps should be carried out to ensure reusability and flexibility. The paper summarizes the exploration of ETL modeling, focusing on Big Data scalability and processing trends. It also identifies critical dilemmas, such as ensuring compatibility across multiple sources and dealing with large volumes of Big Data. Furthermore, it suggests future directions in Big Data integration by leveraging advanced artificial intelligence processing and storage systems to ensure consistency, efficiency, and data integrity.</p>","PeriodicalId":501596,"journal":{"name":"The Journal of Supercomputing","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11227-024-06413-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data integration combines information from different sources to provide a comprehensive view for making informed business decisions. The ETL (Extract, Transform, and Load) process is essential in data integration. In the past two decades, modeling the ETL process has become a priority for effectively managing information. This paper aims to explore ETL approaches to help researchers and organizational stakeholders overcome challenges, especially in Big Data integration. It offers a comprehensive overview of ETL methods, from traditional to Big Data, and discusses their advantages, limitations, and the primary trends in Big Data integration. The study emphasizes that many technologies have been integrated into ETL steps for data collection, storage, processing, querying, and analysis without proper modeling. Therefore, more generic and customized design modeling of the ETL steps should be carried out to ensure reusability and flexibility. The paper summarizes the exploration of ETL modeling, focusing on Big Data scalability and processing trends. It also identifies critical dilemmas, such as ensuring compatibility across multiple sources and dealing with large volumes of Big Data. Furthermore, it suggests future directions in Big Data integration by leveraging advanced artificial intelligence processing and storage systems to ensure consistency, efficiency, and data integrity.