描述加速器核心中硬件并行的成本和收益

2013 IEEE 31st International Conference on Computer Design (ICCD) Pub Date : 2013-11-07 DOI:10.1109/ICCD.2013.6657021

Steven J. Battle, Mark Hempstead

{"title":"描述加速器核心中硬件并行的成本和收益","authors":"Steven J. Battle, Mark Hempstead","doi":"10.1109/ICCD.2013.6657021","DOIUrl":null,"url":null,"abstract":"Power and utilization constraints are limiting the performance gains of traditional architectures. Designers are increasingly embracing specialization to improve performance in the era of dark-silicon. General purpose processors are beginning to resemble SOC's from the embedded domain, and now include many specialized accelerator cores to improve computation-throughput while reducing the energy-cost of computation. The design-space of accelerator cores is wide and varied. Designers are able to specify how much parallelism to expose in hardware by varying input width, pipeline depth, number of compute-lanes, etc. In this paper we study three accelerator cores: DES, FFT, and Jacobi Transform, exhibiting three different types of computation: streaming cryptographic, butterfly DSP, and stencil. We investigate methods to increase parallelism within the accelerator while remaining on the pareto-frontier, and examine the trade-offs faced by designers with respect to area, power, and throughput. We present models of these trade-offs and provide insight into the design of cores under real-world constraints.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Characterizing the costs and benefits of hardware parallelism in accelerator cores\",\"authors\":\"Steven J. Battle, Mark Hempstead\",\"doi\":\"10.1109/ICCD.2013.6657021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Power and utilization constraints are limiting the performance gains of traditional architectures. Designers are increasingly embracing specialization to improve performance in the era of dark-silicon. General purpose processors are beginning to resemble SOC's from the embedded domain, and now include many specialized accelerator cores to improve computation-throughput while reducing the energy-cost of computation. The design-space of accelerator cores is wide and varied. Designers are able to specify how much parallelism to expose in hardware by varying input width, pipeline depth, number of compute-lanes, etc. In this paper we study three accelerator cores: DES, FFT, and Jacobi Transform, exhibiting three different types of computation: streaming cryptographic, butterfly DSP, and stencil. We investigate methods to increase parallelism within the accelerator while remaining on the pareto-frontier, and examine the trade-offs faced by designers with respect to area, power, and throughput. We present models of these trade-offs and provide insight into the design of cores under real-world constraints.\",\"PeriodicalId\":398811,\"journal\":{\"name\":\"2013 IEEE 31st International Conference on Computer Design (ICCD)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 31st International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2013.6657021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 31st International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2013.6657021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

功耗和利用率限制限制了传统架构的性能提升。在暗硅时代，设计师们越来越多地采用专业化来提高性能。通用处理器开始类似于嵌入式领域的SOC，现在包括许多专门的加速器内核，以提高计算吞吐量，同时降低计算的能量成本。加速器核心的设计空间是广泛而多样的。设计人员可以通过改变输入宽度、管道深度、计算通道数量等来指定在硬件中暴露多少并行性。在本文中，我们研究了三种加速器核心:DES, FFT和Jacobi Transform，展示了三种不同类型的计算:流加密，蝴蝶DSP和模板。我们研究了在保持帕累托边界的同时增加加速器内并行性的方法，并研究了设计师在面积、功率和吞吐量方面面临的权衡。我们提出了这些权衡的模型，并提供了在现实世界约束下的核心设计的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Characterizing the costs and benefits of hardware parallelism in accelerator cores

Power and utilization constraints are limiting the performance gains of traditional architectures. Designers are increasingly embracing specialization to improve performance in the era of dark-silicon. General purpose processors are beginning to resemble SOC's from the embedded domain, and now include many specialized accelerator cores to improve computation-throughput while reducing the energy-cost of computation. The design-space of accelerator cores is wide and varied. Designers are able to specify how much parallelism to expose in hardware by varying input width, pipeline depth, number of compute-lanes, etc. In this paper we study three accelerator cores: DES, FFT, and Jacobi Transform, exhibiting three different types of computation: streaming cryptographic, butterfly DSP, and stencil. We investigate methods to increase parallelism within the accelerator while remaining on the pareto-frontier, and examine the trade-offs faced by designers with respect to area, power, and throughput. We present models of these trade-offs and provide insight into the design of cores under real-world constraints.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE 31st International Conference on Computer Design (ICCD)

自引率

0.00%

发文量