Shortening Design Time through Multiplatform Simulations with a Portable OpenCL Golden-model: The LDPC Decoder Case

G. Falcão, Muhsen Owaida, D. Novo, M. Purnaprajna, Nikolaos Bellas, C. Antonopoulos, G. Karakonstantis, A. Burg, P. Ienne
{"title":"Shortening Design Time through Multiplatform Simulations with a Portable OpenCL Golden-model: The LDPC Decoder Case","authors":"G. Falcão, Muhsen Owaida, D. Novo, M. Purnaprajna, Nikolaos Bellas, C. Antonopoulos, G. Karakonstantis, A. Burg, P. Ienne","doi":"10.1109/FCCM.2012.46","DOIUrl":null,"url":null,"abstract":"Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bit width and data representation. This is the case for the development of complex algorithms such as Low-Density Parity-Check (LDPC) decoders used in modern communication systems. Currently, high-performance computing offers a wide set of acceleration options, that range from multicore CPUs to graphics processing units (GPUs) and FPGAs. Depending on the simulation requirements, the ideal architecture to use can vary. In this paper we propose a new design flow based on Open CL, a unified multiplatform programming model, which accelerates LDPC decoding simulations, thereby significantly reducing architectural exploration and design time. Open CL-based parallel kernels are used without modifications or code tuning on multicore CPUs, GPUs and FPGAs. We use SOpen CL (Silicon to Open CL), a tool that automatically converts Open CL kernels to RTL for mapping the simulations into FPGAs. To the best of our knowledge, this is the first time that a single, unmodified Open CL code is used to target those three different platforms. We show that, depending on the design parameters to be explored in the simulation, on the dimension and phase of the design, the GPU or the FPGA may suit different purposes more conveniently, providing different acceleration factors. For example, although simulations can typically execute more than 3× faster on FPGAs than on GPUs, the overhead of circuit synthesis often outweighs the benefits of FPGA-accelerated execution.","PeriodicalId":226197,"journal":{"name":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2012.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19

Abstract

Hardware designers and engineers typically need to explore a multi-parametric design space in order to find the best configuration for their designs using simulations that can take weeks to months to complete. For example, designers of special purpose chips need to explore parameters such as the optimal bit width and data representation. This is the case for the development of complex algorithms such as Low-Density Parity-Check (LDPC) decoders used in modern communication systems. Currently, high-performance computing offers a wide set of acceleration options, that range from multicore CPUs to graphics processing units (GPUs) and FPGAs. Depending on the simulation requirements, the ideal architecture to use can vary. In this paper we propose a new design flow based on Open CL, a unified multiplatform programming model, which accelerates LDPC decoding simulations, thereby significantly reducing architectural exploration and design time. Open CL-based parallel kernels are used without modifications or code tuning on multicore CPUs, GPUs and FPGAs. We use SOpen CL (Silicon to Open CL), a tool that automatically converts Open CL kernels to RTL for mapping the simulations into FPGAs. To the best of our knowledge, this is the first time that a single, unmodified Open CL code is used to target those three different platforms. We show that, depending on the design parameters to be explored in the simulation, on the dimension and phase of the design, the GPU or the FPGA may suit different purposes more conveniently, providing different acceleration factors. For example, although simulations can typically execute more than 3× faster on FPGAs than on GPUs, the overhead of circuit synthesis often outweighs the benefits of FPGA-accelerated execution.
利用便携式OpenCL黄金模型通过多平台仿真缩短设计时间:LDPC解码器案例
硬件设计师和工程师通常需要探索多参数设计空间,以便使用可能需要数周至数月才能完成的模拟找到适合其设计的最佳配置。例如,特殊用途芯片的设计者需要探索诸如最佳位宽度和数据表示等参数。这就是开发复杂算法的情况,例如现代通信系统中使用的低密度奇偶校验(LDPC)解码器。目前,高性能计算提供了一系列广泛的加速选项,从多核cpu到图形处理单元(gpu)和fpga。根据仿真需求,要使用的理想体系结构可能有所不同。本文提出了一种新的基于Open CL的设计流程,该流程是一种统一的多平台编程模型,可以加速LDPC解码仿真,从而大大减少了架构探索和设计时间。开放的基于cl的并行内核无需修改或代码调优就可以在多核cpu、gpu和fpga上使用。我们使用SOpen CL (Silicon to Open CL),这是一个自动将Open CL内核转换为RTL以将模拟映射到fpga的工具。据我们所知,这是第一次使用一个未修改的Open CL代码来针对这三个不同的平台。我们表明,根据仿真中要探索的设计参数,根据设计的尺寸和相位,GPU或FPGA可以更方便地适应不同的目的,提供不同的加速因子。例如,虽然仿真在fpga上的执行速度通常比在gpu上快3倍以上,但电路合成的开销往往超过fpga加速执行的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信