优化数据仓库架构以提高信息系统性能

2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE) Pub Date : 2023-02-16 DOI:10.1109/ICCoSITE57641.2023.10127721

Suriansyah B, A. A. Ilham, A. W. Paundu

{"title":"优化数据仓库架构以提高信息系统性能","authors":"Suriansyah B, A. A. Ilham, A. W. Paundu","doi":"10.1109/ICCoSITE57641.2023.10127721","DOIUrl":null,"url":null,"abstract":"Data growth is increasing day by day, so the data stored in the data warehouse is increasingly piling up. When data is displayed on the dashboard or information system, performance is slow because the process of loading queries from the data warehouse to the information system will access all the data stored in the data warehouse tables. This causes the speed of loading data on information systems to decrease, so optimization is needed in the data warehouse so that the load process becomes lighter even though data growth is increasing. In this research, a scheduling algorithm will be created in Hadoop whose job is to execute the transform extraction process and load summary data into several tables. Aims to streamline and optimized the Extract, Transform, Load (ETL) process to the data warehouse and reduce the volume of data in one table, then will be indexed according to the primary key in each table so that when data is joined to several tables it can be executed quickly. After testing by querying data with the same goal but different tables, namely tables that are optimized and unoptimized produce a query time of 1.418 seconds, while tables unoptimized have a query time of 2.418 seconds. Well as testing the speed of loading data into the information system by comparing the throughput of systems that are optimized and those that are unoptimized have an average throughput difference of 85%. With these results, it can be concluded that the speed in loading data into the information system has been successfully optimized by looking at this comparison.","PeriodicalId":256184,"journal":{"name":"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimization of Data Warehouse Architecture to Improve Information System Performance\",\"authors\":\"Suriansyah B, A. A. Ilham, A. W. Paundu\",\"doi\":\"10.1109/ICCoSITE57641.2023.10127721\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data growth is increasing day by day, so the data stored in the data warehouse is increasingly piling up. When data is displayed on the dashboard or information system, performance is slow because the process of loading queries from the data warehouse to the information system will access all the data stored in the data warehouse tables. This causes the speed of loading data on information systems to decrease, so optimization is needed in the data warehouse so that the load process becomes lighter even though data growth is increasing. In this research, a scheduling algorithm will be created in Hadoop whose job is to execute the transform extraction process and load summary data into several tables. Aims to streamline and optimized the Extract, Transform, Load (ETL) process to the data warehouse and reduce the volume of data in one table, then will be indexed according to the primary key in each table so that when data is joined to several tables it can be executed quickly. After testing by querying data with the same goal but different tables, namely tables that are optimized and unoptimized produce a query time of 1.418 seconds, while tables unoptimized have a query time of 2.418 seconds. Well as testing the speed of loading data into the information system by comparing the throughput of systems that are optimized and those that are unoptimized have an average throughput difference of 85%. With these results, it can be concluded that the speed in loading data into the information system has been successfully optimized by looking at this comparison.\",\"PeriodicalId\":256184,\"journal\":{\"name\":\"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCoSITE57641.2023.10127721\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCoSITE57641.2023.10127721","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

数据增长日益增加，因此数据仓库中存储的数据越来越多。当数据显示在仪表板或信息系统上时，性能会变慢，因为将查询从数据仓库加载到信息系统的过程将访问存储在数据仓库表中的所有数据。这将导致信息系统上加载数据的速度降低，因此需要对数据仓库进行优化，以便在数据增长不断增加的情况下减轻加载过程。在本研究中，将在Hadoop中创建一个调度算法，其工作是执行转换提取过程并将汇总数据加载到几个表中。旨在简化和优化数据仓库的提取、转换、加载(ETL)过程，减少一个表中的数据量，然后根据每个表中的主键进行索引，以便当数据连接到多个表时可以快速执行。通过查询目标相同但不同的表(即经过优化和未优化的表)的数据进行测试后，查询时间为1.418秒，而未优化的表的查询时间为2.418秒。通过比较优化系统和未优化系统的吞吐量来测试将数据加载到信息系统中的速度，平均吞吐量差为85%。根据这些结果，可以得出结论，通过比较，将数据加载到信息系统中的速度已经成功地优化了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimization of Data Warehouse Architecture to Improve Information System Performance

Data growth is increasing day by day, so the data stored in the data warehouse is increasingly piling up. When data is displayed on the dashboard or information system, performance is slow because the process of loading queries from the data warehouse to the information system will access all the data stored in the data warehouse tables. This causes the speed of loading data on information systems to decrease, so optimization is needed in the data warehouse so that the load process becomes lighter even though data growth is increasing. In this research, a scheduling algorithm will be created in Hadoop whose job is to execute the transform extraction process and load summary data into several tables. Aims to streamline and optimized the Extract, Transform, Load (ETL) process to the data warehouse and reduce the volume of data in one table, then will be indexed according to the primary key in each table so that when data is joined to several tables it can be executed quickly. After testing by querying data with the same goal but different tables, namely tables that are optimized and unoptimized produce a query time of 1.418 seconds, while tables unoptimized have a query time of 2.418 seconds. Well as testing the speed of loading data into the information system by comparing the throughput of systems that are optimized and those that are unoptimized have an average throughput difference of 85%. With these results, it can be concluded that the speed in loading data into the information system has been successfully optimized by looking at this comparison.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE)

自引率

0.00%

发文量