A. E. Zvonarev, D.S. Gudilin, Dmitriy A. Lychagin, B. Goryachkin
{"title":"提取-负载-转换(ELT)过程运行时分析与优化","authors":"A. E. Zvonarev, D.S. Gudilin, Dmitriy A. Lychagin, B. Goryachkin","doi":"10.1109/REEPE57272.2023.10086728","DOIUrl":null,"url":null,"abstract":"The article discusses algorithms for optimizing the transformation stage of the ELT process, built on the basis of procedures in the PostgreSQL DBMS and the parallelization mechanism implemented by the Python programming language tools. As a basis for comparison, the most trivial version of the data conversion process was taken, which consists in a sequential connection of each individual procedure. The first proposed algorithm uses the principle of the simplest parallelization of procedures, which allows you to perform independent procedures in parallel. The second algorithm is an improved version of the first one. It uses the principle of step-by-step optimization with additional parallelization of chain blocks of dependent procedures. As the main criterion for evaluation, the time of execution of the entire chain of procedures was taken. As a result of the study, it was determined that the improved version of the procedure parallelization algorithm shows the shortest execution time of the entire chain of the data transformation step.","PeriodicalId":356187,"journal":{"name":"2023 5th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Extract-Load-Transform (ELT) Process Runtime Analysis and Optimization\",\"authors\":\"A. E. Zvonarev, D.S. Gudilin, Dmitriy A. Lychagin, B. Goryachkin\",\"doi\":\"10.1109/REEPE57272.2023.10086728\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article discusses algorithms for optimizing the transformation stage of the ELT process, built on the basis of procedures in the PostgreSQL DBMS and the parallelization mechanism implemented by the Python programming language tools. As a basis for comparison, the most trivial version of the data conversion process was taken, which consists in a sequential connection of each individual procedure. The first proposed algorithm uses the principle of the simplest parallelization of procedures, which allows you to perform independent procedures in parallel. The second algorithm is an improved version of the first one. It uses the principle of step-by-step optimization with additional parallelization of chain blocks of dependent procedures. As the main criterion for evaluation, the time of execution of the entire chain of procedures was taken. As a result of the study, it was determined that the improved version of the procedure parallelization algorithm shows the shortest execution time of the entire chain of the data transformation step.\",\"PeriodicalId\":356187,\"journal\":{\"name\":\"2023 5th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 5th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/REEPE57272.2023.10086728\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 5th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/REEPE57272.2023.10086728","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extract-Load-Transform (ELT) Process Runtime Analysis and Optimization
The article discusses algorithms for optimizing the transformation stage of the ELT process, built on the basis of procedures in the PostgreSQL DBMS and the parallelization mechanism implemented by the Python programming language tools. As a basis for comparison, the most trivial version of the data conversion process was taken, which consists in a sequential connection of each individual procedure. The first proposed algorithm uses the principle of the simplest parallelization of procedures, which allows you to perform independent procedures in parallel. The second algorithm is an improved version of the first one. It uses the principle of step-by-step optimization with additional parallelization of chain blocks of dependent procedures. As the main criterion for evaluation, the time of execution of the entire chain of procedures was taken. As a result of the study, it was determined that the improved version of the procedure parallelization algorithm shows the shortest execution time of the entire chain of the data transformation step.