SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures*

2019 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2019-11-01 DOI:10.1109/IISWC47752.2019.9042069

D. Shankar, Xiaoyi Lu, D. Panda

{"title":"SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures*","authors":"D. Shankar, Xiaoyi Lu, D. Panda","doi":"10.1109/IISWC47752.2019.9042069","DOIUrl":null,"url":null,"abstract":"With the emergence of modern multi-core CPU architectures that support data parallelism via vectorization, several storage systems have been employing SIMD-based techniques to optimize data-parallel operations on in-memory structures like hash-tables. In this paper, we perform an in-depth characterization of the opportunities for incorporating AVX vectorization-based SIMD-aware designs for hash table lookups on emerging CPU architectures. We analyze the challenges and design dimensions involved in exploiting vectorization-based parallel key searching over cache-optimized non-SIMD hash tables. Based on this, we design a comprehensive micro-benchmark suite, SimdHT-Bench, that enables evaluating the performance and applicability of CPU SIMD-aware hash table designs for accelerating different read-intensive workloads. With SimdHT-Bench, we study five different use-case scenarios with varied workload patterns, on the latest Intel Skylake and Intel Cascade Lake multi-core CPU nodes. Further, to validate the applicability of SimdHT-Bench, we employ these performance studies to design a high-performance SIMD-aware RDMA-based in-memory key-value store to accelerate the Memcached ‘Multi-Get’ workload. We demonstrate that the SIMD-integrated designs can achieve up to 1.45x-2.04x improvement in server-side Get throughput and up to 34% improvement in end-to-end Multi-Get latencies over the state-of-the-art CPU-optimized non-SIMD MemC3 hash table design, on a high-performance compute cluster with Intel Skylake processors and InfiniBand EDR interconnects.","PeriodicalId":121068,"journal":{"name":"2019 IEEE International Symposium on Workload Characterization (IISWC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC47752.2019.9042069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

With the emergence of modern multi-core CPU architectures that support data parallelism via vectorization, several storage systems have been employing SIMD-based techniques to optimize data-parallel operations on in-memory structures like hash-tables. In this paper, we perform an in-depth characterization of the opportunities for incorporating AVX vectorization-based SIMD-aware designs for hash table lookups on emerging CPU architectures. We analyze the challenges and design dimensions involved in exploiting vectorization-based parallel key searching over cache-optimized non-SIMD hash tables. Based on this, we design a comprehensive micro-benchmark suite, SimdHT-Bench, that enables evaluating the performance and applicability of CPU SIMD-aware hash table designs for accelerating different read-intensive workloads. With SimdHT-Bench, we study five different use-case scenarios with varied workload patterns, on the latest Intel Skylake and Intel Cascade Lake multi-core CPU nodes. Further, to validate the applicability of SimdHT-Bench, we employ these performance studies to design a high-performance SIMD-aware RDMA-based in-memory key-value store to accelerate the Memcached ‘Multi-Get’ workload. We demonstrate that the SIMD-integrated designs can achieve up to 1.45x-2.04x improvement in server-side Get throughput and up to 34% improvement in end-to-end Multi-Get latencies over the state-of-the-art CPU-optimized non-SIMD MemC3 hash table design, on a high-performance compute cluster with Intel Skylake processors and InfiniBand EDR interconnects.

查看原文本刊更多论文

SimdHT-Bench:在新兴CPU架构上表征simd感知哈希表设计*

随着通过向量化支持数据并行的现代多核CPU体系结构的出现，一些存储系统已经采用基于simd的技术来优化内存结构(如哈希表)上的数据并行操作。在本文中，我们深入描述了将基于AVX向量化的simd感知设计用于新兴CPU架构上的哈希表查找的机会。我们分析了在缓存优化的非simd哈希表上利用基于向量化的并行键搜索所涉及的挑战和设计维度。在此基础上，我们设计了一个全面的微基准套件SimdHT-Bench，它可以评估CPU simd感知哈希表设计的性能和适用性，以加速不同的读密集型工作负载。使用SimdHT-Bench，我们在最新的英特尔Skylake和英特尔Cascade Lake多核CPU节点上研究了具有不同工作负载模式的五种不同用例场景。此外，为了验证SimdHT-Bench的适用性，我们利用这些性能研究来设计一个高性能的simd感知rdma内存中的键值存储，以加速Memcached的“Multi-Get”工作负载。我们证明了simd集成设计在具有Intel Skylake处理器和InfiniBand EDR互连的高性能计算集群上，与最先进的cpu优化的非simd MemC3哈希表设计相比，服务器端Get吞吐量可以提高1.45x-2.04倍，端到端Multi-Get延迟可以提高34%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量