Autonomous database partitioning using data mining on single computers and cluster computers

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2012-08-08 DOI:10.1145/2351476.2351481

Liangzhe Li, L. Gruenwald

{"title":"Autonomous database partitioning using data mining on single computers and cluster computers","authors":"Liangzhe Li, L. Gruenwald","doi":"10.1145/2351476.2351481","DOIUrl":null,"url":null,"abstract":"One of the most important metrics in measuring the performance of a database system is query response time, which is composed of I/O time and CPU time. I/O time is decided by the amount of data read/write from/to disks and how the data is located on disks. CPU time is decided by how the database system performs the query operations. So if we want to reduce the query response time we can reduce either I/O time or CPU time, or both of them. We know retrieving data from disks is much slower than retrieving data from main memory. Hence, one of the common ways to reduce I/O times is clustering data on disks so that queries will access only relevant data. This paper introduces an efficient algorithm, called AutoClust, for automatic database attribute clustering (or also called automatic database vertical partitioning) for single computers as well as cluster computers. It is based on closed item sets mined from queries and their attributes using association rule mining. The paper then presents experimental results comparing the performance of AutoClust with that of a baseline algorithm on both single computers and cluster computers using the TPC-H benchmark running on major commercial database systems. The experiments show that AutoClust has better query costs for both types of computers.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"24 1","pages":"32-41"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Database Engineering and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2351476.2351481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

One of the most important metrics in measuring the performance of a database system is query response time, which is composed of I/O time and CPU time. I/O time is decided by the amount of data read/write from/to disks and how the data is located on disks. CPU time is decided by how the database system performs the query operations. So if we want to reduce the query response time we can reduce either I/O time or CPU time, or both of them. We know retrieving data from disks is much slower than retrieving data from main memory. Hence, one of the common ways to reduce I/O times is clustering data on disks so that queries will access only relevant data. This paper introduces an efficient algorithm, called AutoClust, for automatic database attribute clustering (or also called automatic database vertical partitioning) for single computers as well as cluster computers. It is based on closed item sets mined from queries and their attributes using association rule mining. The paper then presents experimental results comparing the performance of AutoClust with that of a baseline algorithm on both single computers and cluster computers using the TPC-H benchmark running on major commercial database systems. The experiments show that AutoClust has better query costs for both types of computers.

查看原文本刊更多论文

在单台计算机和集群计算机上使用数据挖掘进行自主数据库分区

衡量数据库系统性能的最重要指标之一是查询响应时间，它由I/O时间和CPU时间组成。I/O时间取决于从磁盘读/写/到磁盘的数据量以及数据在磁盘上的位置。CPU时间由数据库系统执行查询操作的方式决定。因此，如果我们想要减少查询响应时间，我们可以减少I/O时间或CPU时间，或者两者都减少。我们知道从磁盘中检索数据要比从主存中检索数据慢得多。因此，减少I/O次数的常用方法之一是对磁盘上的数据进行集群化，以便查询只访问相关数据。本文介绍了一种高效的算法AutoClust，用于单计算机和集群计算机的自动数据库属性聚类(或称为自动数据库垂直分区)。它基于使用关联规则挖掘从查询及其属性中挖掘的封闭项集。然后，本文给出了使用主要商业数据库系统上运行的TPC-H基准测试，在单台计算机和集群计算机上比较AutoClust与基线算法性能的实验结果。实验表明，AutoClust在两种类型的计算机上都具有更好的查询成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. International Database Engineering and Applications Symposium

自引率

0.00%

发文量