Boomerang: A Metadata-Free Architecture for Control Flow Delivery

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2017-05-08 DOI:10.1109/HPCA.2017.53

Rakesh Kumar, Cheng-Chieh Huang, Boris Grot, V. Nagarajan

{"title":"Boomerang: A Metadata-Free Architecture for Control Flow Delivery","authors":"Rakesh Kumar, Cheng-Chieh Huang, Boris Grot, V. Nagarajan","doi":"10.1109/HPCA.2017.53","DOIUrl":null,"url":null,"abstract":"Contemporary server workloads feature massive instruction footprints stemming from deep, layered software stacks. The active instruction working set of the entire stack can easily reach into megabytes, resulting in frequent front-end stalls due to instruction cache misses and pipeline flushes due to branch target buffer (BTB) misses. While a number of techniques have been proposed to address these problems, every one of them requires dedicated metadata structures, translating into significant storage and complexity costs. In this paper, we ask the question whether it is possible to achieve high-performance control flow delivery without the metadata costs of prior techniques. We revisit a previously proposed approach of branch-predictor-directed prefetching, which leverages just the branch predictor and BTB to discover and prefetch the missing instruction cache blocks by exploring the program control flow ahead of the core front-end. Contrary to conventional wisdom, we find that this approach can be effective in covering instruction cache misses in modern CMPs with long LLC access latencies and multi-MB server binaries. Our first contribution lies in explaining the reasons for the efficacy of branch-predictor-directed prefetching. Our second contribution is in Boomerang, a metadata-free architecture for control flow delivery. Boomerang leverages a branch-predictor-directed prefetcher to discover and prefill not only the instruction cache blocks, but also the missing BTB entries. Crucially, we demonstrate that the additional hardware cost required to identify and fill BTB misses is negligible. Our experimental evaluation shows that Boomerang matches the performance of the state-of-the-art control flow delivery scheme without the latter's high metadata and complexity overheads.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2017.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 42

Abstract

Contemporary server workloads feature massive instruction footprints stemming from deep, layered software stacks. The active instruction working set of the entire stack can easily reach into megabytes, resulting in frequent front-end stalls due to instruction cache misses and pipeline flushes due to branch target buffer (BTB) misses. While a number of techniques have been proposed to address these problems, every one of them requires dedicated metadata structures, translating into significant storage and complexity costs. In this paper, we ask the question whether it is possible to achieve high-performance control flow delivery without the metadata costs of prior techniques. We revisit a previously proposed approach of branch-predictor-directed prefetching, which leverages just the branch predictor and BTB to discover and prefetch the missing instruction cache blocks by exploring the program control flow ahead of the core front-end. Contrary to conventional wisdom, we find that this approach can be effective in covering instruction cache misses in modern CMPs with long LLC access latencies and multi-MB server binaries. Our first contribution lies in explaining the reasons for the efficacy of branch-predictor-directed prefetching. Our second contribution is in Boomerang, a metadata-free architecture for control flow delivery. Boomerang leverages a branch-predictor-directed prefetcher to discover and prefill not only the instruction cache blocks, but also the missing BTB entries. Crucially, we demonstrate that the additional hardware cost required to identify and fill BTB misses is negligible. Our experimental evaluation shows that Boomerang matches the performance of the state-of-the-art control flow delivery scheme without the latter's high metadata and complexity overheads.

查看原文本刊更多论文

回旋镖:用于控制流交付的无元数据架构

当代服务器工作负载的特点是源于深层分层软件堆栈的大量指令占用。整个堆栈的活动指令工作集可以很容易地达到兆字节，导致由于指令缓存丢失和由于分支目标缓冲区(BTB)丢失而导致的管道刷新而频繁的前端停滞。虽然已经提出了许多技术来解决这些问题，但每一种技术都需要专用的元数据结构，这意味着需要大量的存储和复杂性成本。在本文中，我们提出的问题是，是否有可能实现高性能的控制流交付，而不需要之前技术的元数据成本。我们回顾了先前提出的分支预测器定向预取方法，该方法仅利用分支预测器和BTB通过探索核心前端之前的程序控制流来发现和预取丢失的指令缓存块。与传统观点相反，我们发现这种方法可以有效地覆盖具有长LLC访问延迟和多mb服务器二进制文件的现代cmp中的指令缓存丢失。我们的第一个贡献在于解释分支预测定向预取的有效性的原因。我们的第二个贡献是Boomerang，一个用于控制流交付的无元数据架构。Boomerang利用一个分支预测器导向的预取器，不仅可以发现和预填充指令缓存块，还可以发现和预填充缺失的BTB条目。至关重要的是，我们证明了识别和填充BTB缺失所需的额外硬件成本可以忽略不计。我们的实验评估表明，Boomerang符合最先进的控制流交付方案的性能，而没有后者的高元数据和复杂性开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量