Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2017-02-01 DOI:10.1109/HPCA.2017.59

Snehasish Kumar, Nick Sumner, V. Srinivasan, Steve Margerm, Arrvindh Shriraman

{"title":"Needle: Leveraging Program Analysis to Analyze and Extract Accelerators from Whole Programs","authors":"Snehasish Kumar, Nick Sumner, V. Srinivasan, Steve Margerm, Arrvindh Shriraman","doi":"10.1109/HPCA.2017.59","DOIUrl":null,"url":null,"abstract":"Technology constraints have increasingly led to the adoption of specialized coprocessors, i.e. hardware accelerators. The first challenge that computer architects encounter is identifying \"what to specialize in the program\". We demonstrate that this requires precise enumeration of program paths based on dynamic program behavior. We hypothesize that path-based [4] accelerator offloading leads to good coverage of dynamic instructions and improve energy efficiency. Unfortunately, hot paths across programs demonstrate diverse control flow behavior. Accelerators (typically based on dataflow execution), often lack an energy-efficient, complexity effective, and high performance (eg. branch prediction) support for control flow. We have developed NEEDLE, an LLVM based compiler framework that leverages dynamic profile information to identify, merge, and offload acceleratable paths from whole applications. NEEDLE derives insight into what code coverage (and consequently energy reduction) an accelerator can achieve. We also develop a novel program abstraction for offload calledBraid, that merges common code regions across different paths to improve coverage of the accelerator while trading off the increase in dataflow size. This enables coarse grained offloading, reducing interaction with the host CPU core. To prepare the Braids and paths for acceleration, NEEDLE generates software frames. Software frames enable energy efficient speculative execution on accelerators. They are accelerator microarchitecture independent support speculative execution including memory operations. NEEDLE is automated and has been used to analyze 225K paths across 29 workloads. It filtered and ranked 154K paths for acceleration across unmodified SPEC, PARSEC and PERFECT workload suites. We target NEEDLE's offload regions toward a CGRA and demonstrate 34% performance and 20% energy improvement.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2017.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Technology constraints have increasingly led to the adoption of specialized coprocessors, i.e. hardware accelerators. The first challenge that computer architects encounter is identifying "what to specialize in the program". We demonstrate that this requires precise enumeration of program paths based on dynamic program behavior. We hypothesize that path-based [4] accelerator offloading leads to good coverage of dynamic instructions and improve energy efficiency. Unfortunately, hot paths across programs demonstrate diverse control flow behavior. Accelerators (typically based on dataflow execution), often lack an energy-efficient, complexity effective, and high performance (eg. branch prediction) support for control flow. We have developed NEEDLE, an LLVM based compiler framework that leverages dynamic profile information to identify, merge, and offload acceleratable paths from whole applications. NEEDLE derives insight into what code coverage (and consequently energy reduction) an accelerator can achieve. We also develop a novel program abstraction for offload calledBraid, that merges common code regions across different paths to improve coverage of the accelerator while trading off the increase in dataflow size. This enables coarse grained offloading, reducing interaction with the host CPU core. To prepare the Braids and paths for acceleration, NEEDLE generates software frames. Software frames enable energy efficient speculative execution on accelerators. They are accelerator microarchitecture independent support speculative execution including memory operations. NEEDLE is automated and has been used to analyze 225K paths across 29 workloads. It filtered and ranked 154K paths for acceleration across unmodified SPEC, PARSEC and PERFECT workload suites. We target NEEDLE's offload regions toward a CGRA and demonstrate 34% performance and 20% energy improvement.

查看原文本刊更多论文

针:利用程序分析分析和提取加速器从整个程序

技术限制越来越多地导致采用专门的协处理器，即硬件加速器。计算机架构师遇到的第一个挑战是确定“在程序中专门化什么”。我们证明，这需要基于动态程序行为的程序路径的精确枚举。我们假设基于路径的[4]加速器卸载可以很好地覆盖动态指令并提高能源效率。不幸的是，跨程序的热路径显示了不同的控制流行为。加速器(通常基于数据流执行)，通常缺乏能效、复杂性、有效性和高性能。分支预测)支持控制流。我们开发了NEEDLE，这是一个基于LLVM的编译器框架，它利用动态配置文件信息来识别、合并和卸载整个应用程序中的可加速路径。NEEDLE深入了解了加速器可以实现的代码覆盖率(以及相应的能耗减少)。我们还为卸载开发了一个新的程序抽象，称为braid，它合并了不同路径上的公共代码区域，以提高加速器的覆盖率，同时权衡数据流大小的增加。这样可以实现粗粒度的卸载，减少与主机CPU核心的交互。为了准备编织和加速路径，NEEDLE生成软件帧。软件框架可以在加速器上实现节能的推测执行。它们是独立于加速器微体系结构的，支持推测执行，包括内存操作。NEEDLE是自动化的，已用于分析29个工作负载中的225K路径。它在未修改的SPEC、PARSEC和PERFECT工作负载套件中过滤并排名154K加速路径。我们将NEEDLE的卸载区域定位为CGRA，并展示了34%的性能和20%的能源改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量