G. Falcão, Muhsen Owaida, D. Novo, M. Purnaprajna, Nikolaos Bellas, C. Antonopoulos, G. Karakonstantis, A. Burg, P. Ienne
{"title":"Shortening Design Time through Multiplatform Simulations with a Portable OpenCL Golden-model: The LDPC Decoder Case","authors":"G. Falcão, Muhsen Owaida, D. Novo, M. Purnaprajna, Nikolaos Bellas, C. Antonopoulos, G. Karakonstantis, A. Burg, P. Ienne","doi":"10.1109/FCCM.2012.46","DOIUrl":null,"url":null,"abstract":"Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bit width and data representation. This is the case for the development of complex algorithms such as Low-Density Parity-Check (LDPC) decoders used in modern communication systems. Currently, high-performance computing offers a wide set of acceleration options, that range from multicore CPUs to graphics processing units (GPUs) and FPGAs. Depending on the simulation requirements, the ideal architecture to use can vary. In this paper we propose a new design flow based on Open CL, a unified multiplatform programming model, which accelerates LDPC decoding simulations, thereby significantly reducing architectural exploration and design time. Open CL-based parallel kernels are used without modifications or code tuning on multicore CPUs, GPUs and FPGAs. We use SOpen CL (Silicon to Open CL), a tool that automatically converts Open CL kernels to RTL for mapping the simulations into FPGAs. To the best of our knowledge, this is the first time that a single, unmodified Open CL code is used to target those three different platforms. We show that, depending on the design parameters to be explored in the simulation, on the dimension and phase of the design, the GPU or the FPGA may suit different purposes more conveniently, providing different acceleration factors. For example, although simulations can typically execute more than 3× faster on FPGAs than on GPUs, the overhead of circuit synthesis often outweighs the benefits of FPGA-accelerated execution.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2012.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bit width and data representation. This is the case for the development of complex algorithms such as Low-Density Parity-Check (LDPC) decoders used in modern communication systems. Currently, high-performance computing offers a wide set of acceleration options, that range from multicore CPUs to graphics processing units (GPUs) and FPGAs. Depending on the simulation requirements, the ideal architecture to use can vary. In this paper we propose a new design flow based on Open CL, a unified multiplatform programming model, which accelerates LDPC decoding simulations, thereby significantly reducing architectural exploration and design time. Open CL-based parallel kernels are used without modifications or code tuning on multicore CPUs, GPUs and FPGAs. We use SOpen CL (Silicon to Open CL), a tool that automatically converts Open CL kernels to RTL for mapping the simulations into FPGAs. To the best of our knowledge, this is the first time that a single, unmodified Open CL code is used to target those three different platforms. We show that, depending on the design parameters to be explored in the simulation, on the dimension and phase of the design, the GPU or the FPGA may suit different purposes more conveniently, providing different acceleration factors. For example, although simulations can typically execute more than 3× faster on FPGAs than on GPUs, the overhead of circuit synthesis often outweighs the benefits of FPGA-accelerated execution.
硬件设计师和工程师通常需要探索多参数设计空间,以便使用可能需要数周至数月才能完成的模拟找到适合其设计的最佳配置。例如,特殊用途芯片的设计者需要探索诸如最佳位宽度和数据表示等参数。这就是开发复杂算法的情况,例如现代通信系统中使用的低密度奇偶校验(LDPC)解码器。目前,高性能计算提供了一系列广泛的加速选项,从多核cpu到图形处理单元(gpu)和fpga。根据仿真需求,要使用的理想体系结构可能有所不同。本文提出了一种新的基于Open CL的设计流程,该流程是一种统一的多平台编程模型,可以加速LDPC解码仿真,从而大大减少了架构探索和设计时间。开放的基于cl的并行内核无需修改或代码调优就可以在多核cpu、gpu和fpga上使用。我们使用SOpen CL (Silicon to Open CL),这是一个自动将Open CL内核转换为RTL以将模拟映射到fpga的工具。据我们所知,这是第一次使用一个未修改的Open CL代码来针对这三个不同的平台。我们表明,根据仿真中要探索的设计参数,根据设计的尺寸和相位,GPU或FPGA可以更方便地适应不同的目的,提供不同的加速因子。例如,虽然仿真在fpga上的执行速度通常比在gpu上快3倍以上,但电路合成的开销往往超过fpga加速执行的好处。