通过将CPU/GPU规格扩展到fpga来增强设计空间探索

ACM Trans. Embed. Comput. Syst. Pub Date : 2015-03-25 DOI:10.1145/2656207

Muhsen Owaida, G. Falcão, J. Andrade, C. Antonopoulos, Nikolaos Bellas, M. Purnaprajna, D. Novo, G. Karakonstantis, A. Burg, P. Ienne

{"title":"通过将CPU/GPU规格扩展到fpga来增强设计空间探索","authors":"Muhsen Owaida, G. Falcão, J. Andrade, C. Antonopoulos, Nikolaos Bellas, M. Purnaprajna, D. Novo, G. Karakonstantis, A. Burg, P. Ienne","doi":"10.1145/2656207","DOIUrl":null,"url":null,"abstract":"The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.","PeriodicalId":183677,"journal":{"name":"ACM Trans. Embed. Comput. Syst.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Enhancing Design Space Exploration by Extending CPU/GPU Specifications onto FPGAs\",\"authors\":\"Muhsen Owaida, G. Falcão, J. Andrade, C. Antonopoulos, Nikolaos Bellas, M. Purnaprajna, D. Novo, G. Karakonstantis, A. Burg, P. Ienne\",\"doi\":\"10.1145/2656207\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.\",\"PeriodicalId\":183677,\"journal\":{\"name\":\"ACM Trans. Embed. Comput. Syst.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-03-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Trans. Embed. Comput. Syst.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2656207\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Embed. Comput. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2656207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

复杂的专用计算系统的设计周期极其昂贵和耗时。它包括一个多参数的设计空间探索优化，其次是设计验证。特殊用途VLSI实现的设计人员通常需要通过耗时的蒙特卡罗模拟来探索参数，例如最佳位宽和数据表示。这种基于仿真的探索过程的一个突出例子是为纠错系统设计解码器，例如现代通信标准采用的低密度奇偶校验(LDPC)码，它涉及每个设计点的数千次蒙特卡罗运行。目前，高性能计算提供了一系列广泛的加速选项，从多核cpu到图形处理单元(gpu)和现场可编程门阵列(fpga)。利用不同的目标体系结构通常与开发多个代码版本相关联，通常使用不同的编程范例。在这种情况下，我们评估了将单个OpenCL程序重新定位到多个平台的概念，从而大大减少了设计时间。在多核cpu、gpu和fpga上无需修改或代码调优即可使用基于opencl的单个并行内核。我们使用SOpenCL(硅到OpenCL)，一个自动将OpenCL内核转换为RTL的工具，以引入fpga作为一个潜在的平台，以有效地执行用OpenCL编码的模拟。我们使用LDPC解码模拟作为案例研究。实验结果通过测试多种规则和不规则的LDPC码得到，范围从短/中(如8000比特)到长(如64800比特)DVB-S2码。我们观察到，根据要模拟的设计参数，根据设计的尺寸和相位，GPU或FPGA可以更方便地适应不同的目的，从而提供比传统多核cpu不同的加速因子。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancing Design Space Exploration by Extending CPU/GPU Specifications onto FPGAs

The design cycle for complex special-purpose computing systems is extremely costly and time-consuming. It involves a multiparametric design space exploration for optimization, followed by design verification. Designers of special purpose VLSI implementations often need to explore parameters, such as optimal bitwidth and data representation, through time-consuming Monte Carlo simulations. A prominent example of this simulation-based exploration process is the design of decoders for error correcting systems, such as the Low-Density Parity-Check (LDPC) codes adopted by modern communication standards, which involves thousands of Monte Carlo runs for each design point. Currently, high-performance computing offers a wide set of acceleration options that range from multicore CPUs to Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The exploitation of diverse target architectures is typically associated with developing multiple code versions, often using distinct programming paradigms. In this context, we evaluate the concept of retargeting a single OpenCL program to multiple platforms, thereby significantly reducing design time. A single OpenCL-based parallel kernel is used without modifications or code tuning on multicore CPUs, GPUs, and FPGAs. We use SOpenCL (Silicon to OpenCL), a tool that automatically converts OpenCL kernels to RTL in order to introduce FPGAs as a potential platform to efficiently execute simulations coded in OpenCL. We use LDPC decoding simulations as a case study. Experimental results were obtained by testing a variety of regular and irregular LDPC codes that range from short/medium (e.g., 8,000 bit) to long length (e.g., 64,800 bit) DVB-S2 codes. We observe that, depending on the design parameters to be simulated, on the dimension and phase of the design, the GPU or FPGA may suit different purposes more conveniently, thus providing different acceleration factors over conventional multicore CPUs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Trans. Embed. Comput. Syst.

自引率

0.00%

发文量