Autonomous database partitioning using data mining on single computers and cluster computers

Liangzhe Li, L. Gruenwald
{"title":"Autonomous database partitioning using data mining on single computers and cluster computers","authors":"Liangzhe Li, L. Gruenwald","doi":"10.1145/2351476.2351481","DOIUrl":null,"url":null,"abstract":"One of the most important metrics in measuring the performance of a database system is query response time, which is composed of I/O time and CPU time. I/O time is decided by the amount of data read/write from/to disks and how the data is located on disks. CPU time is decided by how the database system performs the query operations. So if we want to reduce the query response time we can reduce either I/O time or CPU time, or both of them. We know retrieving data from disks is much slower than retrieving data from main memory. Hence, one of the common ways to reduce I/O times is clustering data on disks so that queries will access only relevant data. This paper introduces an efficient algorithm, called AutoClust, for automatic database attribute clustering (or also called automatic database vertical partitioning) for single computers as well as cluster computers. It is based on closed item sets mined from queries and their attributes using association rule mining. The paper then presents experimental results comparing the performance of AutoClust with that of a baseline algorithm on both single computers and cluster computers using the TPC-H benchmark running on major commercial database systems. The experiments show that AutoClust has better query costs for both types of computers.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"24 1","pages":"32-41"},"PeriodicalIF":0.0000,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Database Engineering and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2351476.2351481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

One of the most important metrics in measuring the performance of a database system is query response time, which is composed of I/O time and CPU time. I/O time is decided by the amount of data read/write from/to disks and how the data is located on disks. CPU time is decided by how the database system performs the query operations. So if we want to reduce the query response time we can reduce either I/O time or CPU time, or both of them. We know retrieving data from disks is much slower than retrieving data from main memory. Hence, one of the common ways to reduce I/O times is clustering data on disks so that queries will access only relevant data. This paper introduces an efficient algorithm, called AutoClust, for automatic database attribute clustering (or also called automatic database vertical partitioning) for single computers as well as cluster computers. It is based on closed item sets mined from queries and their attributes using association rule mining. The paper then presents experimental results comparing the performance of AutoClust with that of a baseline algorithm on both single computers and cluster computers using the TPC-H benchmark running on major commercial database systems. The experiments show that AutoClust has better query costs for both types of computers.
在单台计算机和集群计算机上使用数据挖掘进行自主数据库分区
衡量数据库系统性能的最重要指标之一是查询响应时间,它由I/O时间和CPU时间组成。I/O时间取决于从磁盘读/写/到磁盘的数据量以及数据在磁盘上的位置。CPU时间由数据库系统执行查询操作的方式决定。因此,如果我们想要减少查询响应时间,我们可以减少I/O时间或CPU时间,或者两者都减少。我们知道从磁盘中检索数据要比从主存中检索数据慢得多。因此,减少I/O次数的常用方法之一是对磁盘上的数据进行集群化,以便查询只访问相关数据。本文介绍了一种高效的算法AutoClust,用于单计算机和集群计算机的自动数据库属性聚类(或称为自动数据库垂直分区)。它基于使用关联规则挖掘从查询及其属性中挖掘的封闭项集。然后,本文给出了使用主要商业数据库系统上运行的TPC-H基准测试,在单台计算机和集群计算机上比较AutoClust与基线算法性能的实验结果。实验表明,AutoClust在两种类型的计算机上都具有更好的查询成本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信