{"title":"Fast Multi-Column Sorting in Main-Memory Column-Stores","authors":"Wenjian Xu, Ziqiang Feng, Eric Lo","doi":"10.1145/2882903.2915205","DOIUrl":null,"url":null,"abstract":"Sorting is a crucial operation that could be used to implement SQL operators such as GROUP BY, ORDER BY, and SQL:2003 PARTITION BY. Queries with multiple attributes in those clauses are common in real workloads. When executing queries of that kind, state-of-the-art main-memory column-stores require one round of sorting per input column. With the advent of recent fast scans and denormalization techniques, that kind of multi-column sorting could become a bottleneck. In this paper, we propose a new technique called \"code massaging\", which manipulates the bits across the columns so that the overall sorting time can be reduced by eliminating some rounds of sorting and/or by improving the degree of SIMD data level parallelism. Empirical results show that a main-memory column-store with code massaging can achieve speedup of up to 4.7X, 4.7X, 4X, and 3.2X on TPC-H, TPC-H skew, TPC-DS, and real workload, respectively.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2882903.2915205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Sorting is a crucial operation that could be used to implement SQL operators such as GROUP BY, ORDER BY, and SQL:2003 PARTITION BY. Queries with multiple attributes in those clauses are common in real workloads. When executing queries of that kind, state-of-the-art main-memory column-stores require one round of sorting per input column. With the advent of recent fast scans and denormalization techniques, that kind of multi-column sorting could become a bottleneck. In this paper, we propose a new technique called "code massaging", which manipulates the bits across the columns so that the overall sorting time can be reduced by eliminating some rounds of sorting and/or by improving the degree of SIMD data level parallelism. Empirical results show that a main-memory column-store with code massaging can achieve speedup of up to 4.7X, 4.7X, 4X, and 3.2X on TPC-H, TPC-H skew, TPC-DS, and real workload, respectively.