{"title":"描述加速器核心中硬件并行的成本和收益","authors":"Steven J. Battle, Mark Hempstead","doi":"10.1109/ICCD.2013.6657021","DOIUrl":null,"url":null,"abstract":"Power and utilization constraints are limiting the performance gains of traditional architectures. Designers are increasingly embracing specialization to improve performance in the era of dark-silicon. General purpose processors are beginning to resemble SOC's from the embedded domain, and now include many specialized accelerator cores to improve computation-throughput while reducing the energy-cost of computation. The design-space of accelerator cores is wide and varied. Designers are able to specify how much parallelism to expose in hardware by varying input width, pipeline depth, number of compute-lanes, etc. In this paper we study three accelerator cores: DES, FFT, and Jacobi Transform, exhibiting three different types of computation: streaming cryptographic, butterfly DSP, and stencil. We investigate methods to increase parallelism within the accelerator while remaining on the pareto-frontier, and examine the trade-offs faced by designers with respect to area, power, and throughput. We present models of these trade-offs and provide insight into the design of cores under real-world constraints.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Characterizing the costs and benefits of hardware parallelism in accelerator cores\",\"authors\":\"Steven J. Battle, Mark Hempstead\",\"doi\":\"10.1109/ICCD.2013.6657021\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Power and utilization constraints are limiting the performance gains of traditional architectures. Designers are increasingly embracing specialization to improve performance in the era of dark-silicon. General purpose processors are beginning to resemble SOC's from the embedded domain, and now include many specialized accelerator cores to improve computation-throughput while reducing the energy-cost of computation. The design-space of accelerator cores is wide and varied. Designers are able to specify how much parallelism to expose in hardware by varying input width, pipeline depth, number of compute-lanes, etc. In this paper we study three accelerator cores: DES, FFT, and Jacobi Transform, exhibiting three different types of computation: streaming cryptographic, butterfly DSP, and stencil. We investigate methods to increase parallelism within the accelerator while remaining on the pareto-frontier, and examine the trade-offs faced by designers with respect to area, power, and throughput. We present models of these trade-offs and provide insight into the design of cores under real-world constraints.\",\"PeriodicalId\":398811,\"journal\":{\"name\":\"2013 IEEE 31st International Conference on Computer Design (ICCD)\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 31st International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2013.6657021\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 31st International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2013.6657021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Characterizing the costs and benefits of hardware parallelism in accelerator cores
Power and utilization constraints are limiting the performance gains of traditional architectures. Designers are increasingly embracing specialization to improve performance in the era of dark-silicon. General purpose processors are beginning to resemble SOC's from the embedded domain, and now include many specialized accelerator cores to improve computation-throughput while reducing the energy-cost of computation. The design-space of accelerator cores is wide and varied. Designers are able to specify how much parallelism to expose in hardware by varying input width, pipeline depth, number of compute-lanes, etc. In this paper we study three accelerator cores: DES, FFT, and Jacobi Transform, exhibiting three different types of computation: streaming cryptographic, butterfly DSP, and stencil. We investigate methods to increase parallelism within the accelerator while remaining on the pareto-frontier, and examine the trade-offs faced by designers with respect to area, power, and throughput. We present models of these trade-offs and provide insight into the design of cores under real-world constraints.