Characterizing the costs and benefits of hardware parallelism in accelerator cores

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI:10.1109/ICCD.2013.6657021

Steven J. Battle, Mark Hempstead

{"title":"Characterizing the costs and benefits of hardware parallelism in accelerator cores","authors":"Steven J. Battle, Mark Hempstead","doi":"10.1109/ICCD.2013.6657021","DOIUrl":null,"url":null,"abstract":"Power and utilization constraints are limiting the performance gains of traditional architectures. Designers are increasingly embracing specialization to improve performance in the era of dark-silicon. General purpose processors are beginning to resemble SOC's from the embedded domain, and now include many specialized accelerator cores to improve computation-throughput while reducing the energy-cost of computation. The design-space of accelerator cores is wide and varied. Designers are able to specify how much parallelism to expose in hardware by varying input width, pipeline depth, number of compute-lanes, etc. In this paper we study three accelerator cores: DES, FFT, and Jacobi Transform, exhibiting three different types of computation: streaming cryptographic, butterfly DSP, and stencil. We investigate methods to increase parallelism within the accelerator while remaining on the pareto-frontier, and examine the trade-offs faced by designers with respect to area, power, and throughput. We present models of these trade-offs and provide insight into the design of cores under real-world constraints.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 31st International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2013.6657021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Power and utilization constraints are limiting the performance gains of traditional architectures. Designers are increasingly embracing specialization to improve performance in the era of dark-silicon. General purpose processors are beginning to resemble SOC's from the embedded domain, and now include many specialized accelerator cores to improve computation-throughput while reducing the energy-cost of computation. The design-space of accelerator cores is wide and varied. Designers are able to specify how much parallelism to expose in hardware by varying input width, pipeline depth, number of compute-lanes, etc. In this paper we study three accelerator cores: DES, FFT, and Jacobi Transform, exhibiting three different types of computation: streaming cryptographic, butterfly DSP, and stencil. We investigate methods to increase parallelism within the accelerator while remaining on the pareto-frontier, and examine the trade-offs faced by designers with respect to area, power, and throughput. We present models of these trade-offs and provide insight into the design of cores under real-world constraints.

查看原文本刊更多论文

描述加速器核心中硬件并行的成本和收益

功耗和利用率限制限制了传统架构的性能提升。在暗硅时代，设计师们越来越多地采用专业化来提高性能。通用处理器开始类似于嵌入式领域的SOC，现在包括许多专门的加速器内核，以提高计算吞吐量，同时降低计算的能量成本。加速器核心的设计空间是广泛而多样的。设计人员可以通过改变输入宽度、管道深度、计算通道数量等来指定在硬件中暴露多少并行性。在本文中，我们研究了三种加速器核心:DES, FFT和Jacobi Transform，展示了三种不同类型的计算:流加密，蝴蝶DSP和模板。我们研究了在保持帕累托边界的同时增加加速器内并行性的方法，并研究了设计师在面积、功率和吞吐量方面面临的权衡。我们提出了这些权衡的模型，并提供了在现实世界约束下的核心设计的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 31st International Conference on Computer Design (ICCD)

自引率

0.00%

发文量