固态硬盘的查询处理技术

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI:10.1145/1559845.1559854

Dimitris Tsirogiannis, S. Harizopoulos, Mehul A. Shah, J. Wiener, G. Graefe

{"title":"固态硬盘的查询处理技术","authors":"Dimitris Tsirogiannis, S. Harizopoulos, Mehul A. Shah, J. Wiener, G. Graefe","doi":"10.1145/1559845.1559854","DOIUrl":null,"url":null,"abstract":"Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradually replace hard disks as the primary permanent storage media in large data centers. However, although they may benefit applications that stress random reads immediately, they may not improve database applications, especially those running long data analysis queries. Database query processing engines have been designed around the speed mismatch between random and sequential I/O on hard disks and their algorithms currently emphasize sequential accesses for disk-resident data. In this paper, we investigate data structures and algorithms that leverage fast random reads to speed up selection, projection, and join operations in relational query processing. We first demonstrate how a column-based layout within each page reduces the amount of data read during selections and projections. We then introduce FlashJoin, a general pipelined join algorithm that minimizes accesses to base and intermediate relational data. FlashJoin's binary join kernel accesses only the join attributes, producing partial results in the form of a join index. Subsequently, its fetch kernel retrieves the attributes for later nodes in the query plan as they are needed. FlashJoin significantly reduces memory and I/O requirements for each join in the query. We implemented these techniques inside Postgres and experimented with an enterprise SSD drive. Our techniques improved query runtimes by up to 6x for queries ranging from simple relational scans and joins to full TPC-H queries.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"155","resultStr":"{\"title\":\"Query processing techniques for solid state drives\",\"authors\":\"Dimitris Tsirogiannis, S. Harizopoulos, Mehul A. Shah, J. Wiener, G. Graefe\",\"doi\":\"10.1145/1559845.1559854\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradually replace hard disks as the primary permanent storage media in large data centers. However, although they may benefit applications that stress random reads immediately, they may not improve database applications, especially those running long data analysis queries. Database query processing engines have been designed around the speed mismatch between random and sequential I/O on hard disks and their algorithms currently emphasize sequential accesses for disk-resident data. In this paper, we investigate data structures and algorithms that leverage fast random reads to speed up selection, projection, and join operations in relational query processing. We first demonstrate how a column-based layout within each page reduces the amount of data read during selections and projections. We then introduce FlashJoin, a general pipelined join algorithm that minimizes accesses to base and intermediate relational data. FlashJoin's binary join kernel accesses only the join attributes, producing partial results in the form of a join index. Subsequently, its fetch kernel retrieves the attributes for later nodes in the query plan as they are needed. FlashJoin significantly reduces memory and I/O requirements for each join in the query. We implemented these techniques inside Postgres and experimented with an enterprise SSD drive. Our techniques improved query runtimes by up to 6x for queries ranging from simple relational scans and joins to full TPC-H queries.\",\"PeriodicalId\":344093,\"journal\":{\"name\":\"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data\",\"volume\":\"86 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"155\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1559845.1559854\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1559845.1559854","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 155

摘要

固态硬盘执行随机读取的速度比传统的磁性硬盘快100倍，同时提供相当的顺序读写带宽。由于它们加速应用程序的潜力，以及它们降低的功耗，这些新型驱动器有望逐渐取代硬盘，成为大型数据中心的主要永久存储介质。然而，尽管它们可能有利于强调随机读取的应用程序，但它们可能不会改善数据库应用程序，特别是那些运行长时间数据分析查询的应用程序。数据库查询处理引擎是围绕硬盘上随机和顺序I/O之间的速度不匹配而设计的，它们的算法目前强调对磁盘驻留数据的顺序访问。在本文中，我们研究了利用快速随机读取来加速关系查询处理中的选择、投影和连接操作的数据结构和算法。我们首先演示每个页面中基于列的布局如何减少选择和投影期间读取的数据量。然后介绍FlashJoin，这是一种通用的流水线连接算法，可以最大限度地减少对基础和中间关系数据的访问。FlashJoin的二进制连接内核只访问连接属性，以连接索引的形式产生部分结果。随后，它的fetch内核在需要时检索查询计划中后面节点的属性。FlashJoin显著降低了查询中每个连接的内存和I/O需求。我们在Postgres内部实现了这些技术，并在企业SSD驱动器上进行了实验。从简单的关系扫描和连接到完整的TPC-H查询，我们的技术最多将查询运行时间提高了6倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Query processing techniques for solid state drives

Solid state drives perform random reads more than 100x faster than traditional magnetic hard disks, while offering comparable sequential read and write bandwidth. Because of their potential to speed up applications, as well as their reduced power consumption, these new drives are expected to gradually replace hard disks as the primary permanent storage media in large data centers. However, although they may benefit applications that stress random reads immediately, they may not improve database applications, especially those running long data analysis queries. Database query processing engines have been designed around the speed mismatch between random and sequential I/O on hard disks and their algorithms currently emphasize sequential accesses for disk-resident data. In this paper, we investigate data structures and algorithms that leverage fast random reads to speed up selection, projection, and join operations in relational query processing. We first demonstrate how a column-based layout within each page reduces the amount of data read during selections and projections. We then introduce FlashJoin, a general pipelined join algorithm that minimizes accesses to base and intermediate relational data. FlashJoin's binary join kernel accesses only the join attributes, producing partial results in the form of a join index. Subsequently, its fetch kernel retrieves the attributes for later nodes in the query plan as they are needed. FlashJoin significantly reduces memory and I/O requirements for each join in the query. We implemented these techniques inside Postgres and experimented with an enterprise SSD drive. Our techniques improved query runtimes by up to 6x for queries ranging from simple relational scans and joins to full TPC-H queries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

自引率

0.00%

发文量