BHive: A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models

2019 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2019-11-01 DOI:10.1109/IISWC47752.2019.9042166

Yishen Chen, Ajay Brahmakshatriya, Charith Mendis, Alex Renda, Eric Hamilton Atkinson, O. Sýkora, Saman P. Amarasinghe, Michael Carbin

{"title":"BHive: A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models","authors":"Yishen Chen, Ajay Brahmakshatriya, Charith Mendis, Alex Renda, Eric Hamilton Atkinson, O. Sýkora, Saman P. Amarasinghe, Michael Carbin","doi":"10.1109/IISWC47752.2019.9042166","DOIUrl":null,"url":null,"abstract":"Compilers and performance engineers use hardware performance models to simplify program optimizations. Performance models provide a necessary abstraction over complex modern processors. However, constructing and maintaining a performance model can be onerous, given the numerous microarchi-tectural optimizations employed by modern processors. Despite their complexity and reported inaccuracy (e.g., deviating from native measurement by more than 30%), existing performance models-such as IACA and llvm-mca-have not been systematically validated, because there is no scalable machine code profiler that can automatically obtain throughput of arbitrary basic blocks while conforming to common modeling assumptions. In this paper, we present a novel profiler that can profile arbitrary memory-accessing basic blocks without any user intervention. We used this profiler to build BHive, a benchmark for systematic validation of performance models of x86-64 basic blocks. We used BHive to evaluate four existing performance models: IACA, llvm-mca, Ithemal, and OSACA. We automatically cluster basic blocks in the benchmark suite based on their utilization of CPU resources. Using this clustering, our benchmark can give a detailed analysis of a performance model's strengths and weaknesses on different workloads (e.g., vectorized vs. scalar basic blocks). We additionally demonstrate that our dataset well captures basic properties of two Google applications: Spanner and Dremel.","PeriodicalId":121068,"journal":{"name":"2019 IEEE International Symposium on Workload Characterization (IISWC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC47752.2019.9042166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

Abstract

Compilers and performance engineers use hardware performance models to simplify program optimizations. Performance models provide a necessary abstraction over complex modern processors. However, constructing and maintaining a performance model can be onerous, given the numerous microarchi-tectural optimizations employed by modern processors. Despite their complexity and reported inaccuracy (e.g., deviating from native measurement by more than 30%), existing performance models-such as IACA and llvm-mca-have not been systematically validated, because there is no scalable machine code profiler that can automatically obtain throughput of arbitrary basic blocks while conforming to common modeling assumptions. In this paper, we present a novel profiler that can profile arbitrary memory-accessing basic blocks without any user intervention. We used this profiler to build BHive, a benchmark for systematic validation of performance models of x86-64 basic blocks. We used BHive to evaluate four existing performance models: IACA, llvm-mca, Ithemal, and OSACA. We automatically cluster basic blocks in the benchmark suite based on their utilization of CPU resources. Using this clustering, our benchmark can give a detailed analysis of a performance model's strengths and weaknesses on different workloads (e.g., vectorized vs. scalar basic blocks). We additionally demonstrate that our dataset well captures basic properties of two Google applications: Spanner and Dremel.

查看原文本刊更多论文

BHive:用于验证x86-64基本块性能模型的基准套件和测量框架

编译器和性能工程师使用硬件性能模型来简化程序优化。性能模型为复杂的现代处理器提供了必要的抽象。然而，考虑到现代处理器所采用的大量微体系结构优化，构建和维护性能模型可能是繁重的。尽管它们的复杂性和报告的不准确性(例如，偏离本地测量超过30%)，现有的性能模型-如IACA和llvm-mca-还没有得到系统的验证，因为没有可扩展的机器代码分析器，可以自动获得任意基本块的吞吐量，同时符合常见的建模假设。在本文中，我们提出了一种新的分析器，它可以在没有任何用户干预的情况下分析任意内存访问基本块。我们使用这个分析器来构建BHive，这是一个用于系统验证x86-64基本块性能模型的基准。我们使用BHive来评估四个现有的性能模型:IACA、llvm-mca、Ithemal和OSACA。我们根据基准测试套件中的基本块对CPU资源的利用率自动对它们进行集群。使用这种聚类，我们的基准测试可以详细分析性能模型在不同工作负载(例如，向量化与标量基本块)上的优缺点。我们还证明了我们的数据集很好地捕获了两个Google应用程序的基本属性:Spanner和Dremel。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Symposium on Workload Characterization (IISWC)

自引率

0.00%

发文量