从RTL ip自动生成面向gpu的软件应用程序

2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) Pub Date : 2013-09-29 DOI:10.1109/CODES-ISSS.2013.6658999

N. Bombieri, F. Fummi, S. Vinco

{"title":"从RTL ip自动生成面向gpu的软件应用程序","authors":"N. Bombieri, F. Fummi, S. Vinco","doi":"10.1109/CODES-ISSS.2013.6658999","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) have been explored as a new computing paradigm for accelerating computation intensive applications. In particular, the combination between GPUs and CPU has proved to be an effective solution for accelerating the software execution, by mixing the few CPU cores optimized for serial processing with many smaller GPU cores designed for massively parallel computations. In addition, sustained by the need of low power consumption besides high performance, a recent trend is combining GPUs and CPU onto a single die (e.g., AMD Fusion, Intel Sandy Bridge, NVIDIA Tegra). The good tradeoff between computing capability and power consumption makes the integrated GPUs a promising alternative for accelerating a wide range of software application for embedded systems. Nevertheless, algorithms must be redesigned to take advantage of these architectures and such a manual parallelization often results in being unsatisfactory. This paper presents a methodology to automatically generate software applications for GPUs, by reusing existing and preverified register-transfer level (RTL) intellectual-properties (IPs). The methodology aims at exploiting the intrinsic parallelism of RTL IPs (such as process concurrency and pipeline micro-architecture) for generating the parallel software implementation of the functionality. The experimental results show how the performance obtained by running the RTL functionality as software applications on GPUs outperform those provided by the RTL code mapped into a hardware accelerator.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"On the automatic generation of GPU-oriented software applications from RTL IPs\",\"authors\":\"N. Bombieri, F. Fummi, S. Vinco\",\"doi\":\"10.1109/CODES-ISSS.2013.6658999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graphics processing units (GPUs) have been explored as a new computing paradigm for accelerating computation intensive applications. In particular, the combination between GPUs and CPU has proved to be an effective solution for accelerating the software execution, by mixing the few CPU cores optimized for serial processing with many smaller GPU cores designed for massively parallel computations. In addition, sustained by the need of low power consumption besides high performance, a recent trend is combining GPUs and CPU onto a single die (e.g., AMD Fusion, Intel Sandy Bridge, NVIDIA Tegra). The good tradeoff between computing capability and power consumption makes the integrated GPUs a promising alternative for accelerating a wide range of software application for embedded systems. Nevertheless, algorithms must be redesigned to take advantage of these architectures and such a manual parallelization often results in being unsatisfactory. This paper presents a methodology to automatically generate software applications for GPUs, by reusing existing and preverified register-transfer level (RTL) intellectual-properties (IPs). The methodology aims at exploiting the intrinsic parallelism of RTL IPs (such as process concurrency and pipeline micro-architecture) for generating the parallel software implementation of the functionality. The experimental results show how the performance obtained by running the RTL functionality as software applications on GPUs outperform those provided by the RTL code mapped into a hardware accelerator.\",\"PeriodicalId\":163484,\"journal\":{\"name\":\"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CODES-ISSS.2013.6658999\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CODES-ISSS.2013.6658999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

图形处理单元(gpu)作为加速计算密集型应用的一种新的计算范式已经被探索。特别是，GPU和CPU的结合已被证明是加速软件执行的有效解决方案，通过混合为串行处理优化的少数CPU内核和为大规模并行计算设计的许多较小的GPU内核。此外，除了高性能之外，由于对低功耗的需求，最近的趋势是将gpu和CPU结合到一个芯片上(例如，AMD Fusion，英特尔Sandy Bridge, NVIDIA Tegra)。在计算能力和功耗之间的良好权衡使得集成gpu成为加速嵌入式系统广泛软件应用程序的有前途的替代方案。然而，必须重新设计算法以利用这些体系结构，而这种手动并行化通常会导致不满意的结果。本文提出了一种通过重用现有的和预先验证的寄存器传输级(RTL)知识产权(ip)来自动生成gpu软件应用程序的方法。该方法旨在利用RTL ip的内在并行性(如进程并发性和管道微架构)来生成功能的并行软件实现。实验结果表明，将RTL功能作为软件应用程序在gpu上运行所获得的性能优于将RTL代码映射到硬件加速器中所提供的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the automatic generation of GPU-oriented software applications from RTL IPs

Graphics processing units (GPUs) have been explored as a new computing paradigm for accelerating computation intensive applications. In particular, the combination between GPUs and CPU has proved to be an effective solution for accelerating the software execution, by mixing the few CPU cores optimized for serial processing with many smaller GPU cores designed for massively parallel computations. In addition, sustained by the need of low power consumption besides high performance, a recent trend is combining GPUs and CPU onto a single die (e.g., AMD Fusion, Intel Sandy Bridge, NVIDIA Tegra). The good tradeoff between computing capability and power consumption makes the integrated GPUs a promising alternative for accelerating a wide range of software application for embedded systems. Nevertheless, algorithms must be redesigned to take advantage of these architectures and such a manual parallelization often results in being unsatisfactory. This paper presents a methodology to automatically generate software applications for GPUs, by reusing existing and preverified register-transfer level (RTL) intellectual-properties (IPs). The methodology aims at exploiting the intrinsic parallelism of RTL IPs (such as process concurrency and pipeline micro-architecture) for generating the parallel software implementation of the functionality. The experimental results show how the performance obtained by running the RTL functionality as software applications on GPUs outperform those provided by the RTL code mapped into a hardware accelerator.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

自引率

0.00%

发文量