SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures*

D. Shankar, Xiaoyi Lu, D. Panda
{"title":"SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures*","authors":"D. Shankar, Xiaoyi Lu, D. Panda","doi":"10.1109/IISWC47752.2019.9042069","DOIUrl":null,"url":null,"abstract":"With the emergence of modern multi-core CPU architectures that support data parallelism via vectorization, several storage systems have been employing SIMD-based techniques to optimize data-parallel operations on in-memory structures like hash-tables. In this paper, we perform an in-depth characterization of the opportunities for incorporating AVX vectorization-based SIMD-aware designs for hash table lookups on emerging CPU architectures. We analyze the challenges and design dimensions involved in exploiting vectorization-based parallel key searching over cache-optimized non-SIMD hash tables. Based on this, we design a comprehensive micro-benchmark suite, SimdHT-Bench, that enables evaluating the performance and applicability of CPU SIMD-aware hash table designs for accelerating different read-intensive workloads. With SimdHT-Bench, we study five different use-case scenarios with varied workload patterns, on the latest Intel Skylake and Intel Cascade Lake multi-core CPU nodes. Further, to validate the applicability of SimdHT-Bench, we employ these performance studies to design a high-performance SIMD-aware RDMA-based in-memory key-value store to accelerate the Memcached ‘Multi-Get’ workload. We demonstrate that the SIMD-integrated designs can achieve up to 1.45x-2.04x improvement in server-side Get throughput and up to 34% improvement in end-to-end Multi-Get latencies over the state-of-the-art CPU-optimized non-SIMD MemC3 hash table design, on a high-performance compute cluster with Intel Skylake processors and InfiniBand EDR interconnects.","PeriodicalId":121068,"journal":{"name":"2019 IEEE International Symposium on Workload Characterization (IISWC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC47752.2019.9042069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

With the emergence of modern multi-core CPU architectures that support data parallelism via vectorization, several storage systems have been employing SIMD-based techniques to optimize data-parallel operations on in-memory structures like hash-tables. In this paper, we perform an in-depth characterization of the opportunities for incorporating AVX vectorization-based SIMD-aware designs for hash table lookups on emerging CPU architectures. We analyze the challenges and design dimensions involved in exploiting vectorization-based parallel key searching over cache-optimized non-SIMD hash tables. Based on this, we design a comprehensive micro-benchmark suite, SimdHT-Bench, that enables evaluating the performance and applicability of CPU SIMD-aware hash table designs for accelerating different read-intensive workloads. With SimdHT-Bench, we study five different use-case scenarios with varied workload patterns, on the latest Intel Skylake and Intel Cascade Lake multi-core CPU nodes. Further, to validate the applicability of SimdHT-Bench, we employ these performance studies to design a high-performance SIMD-aware RDMA-based in-memory key-value store to accelerate the Memcached ‘Multi-Get’ workload. We demonstrate that the SIMD-integrated designs can achieve up to 1.45x-2.04x improvement in server-side Get throughput and up to 34% improvement in end-to-end Multi-Get latencies over the state-of-the-art CPU-optimized non-SIMD MemC3 hash table design, on a high-performance compute cluster with Intel Skylake processors and InfiniBand EDR interconnects.
SimdHT-Bench:在新兴CPU架构上表征simd感知哈希表设计*
随着通过向量化支持数据并行的现代多核CPU体系结构的出现,一些存储系统已经采用基于simd的技术来优化内存结构(如哈希表)上的数据并行操作。在本文中,我们深入描述了将基于AVX向量化的simd感知设计用于新兴CPU架构上的哈希表查找的机会。我们分析了在缓存优化的非simd哈希表上利用基于向量化的并行键搜索所涉及的挑战和设计维度。在此基础上,我们设计了一个全面的微基准套件SimdHT-Bench,它可以评估CPU simd感知哈希表设计的性能和适用性,以加速不同的读密集型工作负载。使用SimdHT-Bench,我们在最新的英特尔Skylake和英特尔Cascade Lake多核CPU节点上研究了具有不同工作负载模式的五种不同用例场景。此外,为了验证SimdHT-Bench的适用性,我们利用这些性能研究来设计一个高性能的simd感知rdma内存中的键值存储,以加速Memcached的“Multi-Get”工作负载。我们证明了simd集成设计在具有Intel Skylake处理器和InfiniBand EDR互连的高性能计算集群上,与最先进的cpu优化的非simd MemC3哈希表设计相比,服务器端Get吞吐量可以提高1.45x-2.04倍,端到端Multi-Get延迟可以提高34%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信