Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献_第10页

An Automated Design Framework for Floating Point Scientific Algorithms using Field Programmable Gate Arrays (FPGAs) (Abstract Only) 基于现场可编程门阵列(fpga)的浮点科学算法自动化设计框架(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689101

Michaela Amoo, Youngsoo Kim, Vance Alford, Shrikant S. Jadhav, Naser El-Bathy, C. Gloster

{"title":"An Automated Design Framework for Floating Point Scientific Algorithms using Field Programmable Gate Arrays (FPGAs) (Abstract Only)","authors":"Michaela Amoo, Youngsoo Kim, Vance Alford, Shrikant S. Jadhav, Naser El-Bathy, C. Gloster","doi":"10.1145/2684746.2689101","DOIUrl":"https://doi.org/10.1145/2684746.2689101","url":null,"abstract":"This paper presents a reconfigurable computing environment while addressing the problem of porting High Performance Computing (HPC) applications directly to Field Programmable Gate Arrays (FPGAs)-based architectures. The objectives of this research are developing a comprehensive floating point library of essential functions for scientific applications; demonstrate order of magnitude speedup of reconfigurable computing applications, demonstrating the effectiveness of automated design framework for both development and test of scientific algorithms. The developed framework can be reused in various scientific applications which shares kernel functions. The study of this research has identified an exponential function as a kernel for cellular ophthalmoscopy camera processing, traffic monitoring and light wave simulation. The paper demonstrates 30x speedup of these kernels in three algorithms using its novel architecture and its automated toolset. Exponential kernel generation case study and its flexible hardware implementation on an FPGA has been validated onto a Xilinx LX-100 device and the Nallatech H101-PCIXM FPGA board.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132055680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Source-Level Transformations to Improve High-Level Synthesis Debug and Validation on FPGAs 使用源级转换改进fpga的高级综合调试和验证

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689087

Joshua S. Monson, B. Hutchings

引用次数: 35

Logic Gates in the routing network of FPGAs (Abstract Only) fpga路由网络中的逻辑门(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689098

Elias Vansteenkiste, Berg Severens, D. Stroobandt

{"title":"Logic Gates in the routing network of FPGAs (Abstract Only)","authors":"Elias Vansteenkiste, Berg Severens, D. Stroobandt","doi":"10.1145/2684746.2689098","DOIUrl":"https://doi.org/10.1145/2684746.2689098","url":null,"abstract":"We propose a new kind of FPGA architecture with a routing network that not only provides interconnections between the functional blocks but also performs some logic operation. More specifically we replaced the routing multiplexer node in the conventional architecture with an element that can be used as both AND gate and multiplexer. A conventional routing multiplexer node consists of a multiplexer and a two stage buffer. In our new architecture a NAND gate replaces the first inverter stage of the buffer and two multiplexers half the size of the original multiplexer replace the original multiplexer. The aim of this study is to determine if this kind of architecture is feasible and if it is worth to implement pack, placement and routing tools in the future. We developed a new technology-mapping algorithm and sized the transistors in this new architecture to evaluate the area and delay. Preliminary results indicate that the gain in logic depth and area achieved by mapping to not only LUTs but also to AND gates outweighs the overhead of introducing AND gates in the routing network with a net reduction in area-delay product of 5.6. Designs implemented on the proposed architecture would require 11.2 % more area, but they will have a 14 % decreased logic depth and the architecture has a slightly faster representative critical path. These results are preliminary because the pack, place and route routines are not implemented yet.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130597125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimized Fixed-Point FPGA Implementation of SVPWM for a Two-Level Inverter (Abstract Only) 双电平逆变器SVPWM的定点FPGA优化实现(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689144

D. Mohammadi, S. Ahmed-Zaid, N. Rafla

引用次数: 0

Design of a Loeffler DCT using Xilinx Vivado HLS (Abstract Only) 基于Xilinx Vivado HLS的Loeffler DCT设计(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2694735

Seung Yeol Baik, S. Jeong, H. Oh

{"title":"Design of a Loeffler DCT using Xilinx Vivado HLS (Abstract Only)","authors":"Seung Yeol Baik, S. Jeong, H. Oh","doi":"10.1145/2684746.2694735","DOIUrl":"https://doi.org/10.1145/2684746.2694735","url":null,"abstract":"Loeffler discrete cosine transform (DCT) algorithm is recognized as the most efficient one because it requires the theoretically least number of multiplications. However, many applications still encounter difficulty in performing the 11 multiplications required by the algorithm to calculate a 1D eight-point DCT. To avoid expensive multipliers in the hardware, we used two design methods, namely, distributed arithmetic (DA) and shift-and-add (SAA) methods, to design the DCT accelerator. The memory bandwidth is 60 bits: 24 bits for reads of the R(red), G(green), and B(blue) data of a pixel and 36 bits for writes of three corresponding 12-bit DCT coefficients. Thus, the 1D eight-point DCT accelerator for each of R, G, and B can have one 12-bit input port and one 12-bit output port so that it can calculate a 2D DCT by row-column decomposition method. The designs are adjusted to produce the same latency and interval. DA seems promising because Loeffler DCT requires only three small tables with four input bits. However, our experiments using Xilinx Vivado HLS show that the SAA design is better than the DA design for the considered applications. Furthermore, simulation results suggest that the optimal accelerator design can be obtained by adjusting the SAA design to the considered applications. The resultant SAA design requires only 13 adders (per color component) and can calculate one DCT coefficient per clock cycle. The precision of the internal hardware has been adjusted, such that the reconstructed images have PSNR values of at least 39.1 dB for all test images (Lenna, Pepper, House, and Cameraman). If a precision of 13bits is allowed, PSNR becomes at least 44.8 dB. Our presentation describes the architecture and operation of the optimized SAA design.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125372408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

InTime: A Machine Learning Approach for Efficient Selection of FPGA CAD Tool Parameters 一种基于机器学习的FPGA CAD工具参数选择方法

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689081

Nachiket Kapre, Harnhua Ng, K. Teo, J. Naude

{"title":"InTime: A Machine Learning Approach for Efficient Selection of FPGA CAD Tool Parameters","authors":"Nachiket Kapre, Harnhua Ng, K. Teo, J. Naude","doi":"10.1145/2684746.2689081","DOIUrl":"https://doi.org/10.1145/2684746.2689081","url":null,"abstract":"FPGA CAD tool parameters controlling synthesis optimizations, place and route effort, mapping criteria along with user-supplied physical constraints can affect timing results of the circuit by as much as 70% without any change in original source code. A correct selection of these parameters across a diverse set of benchmarks with varying characteristics and design goals is challenging. The sheer number of parameters and option values that can be selected is large (thousands of combinations for modern CAD tools) with often conflicting interactions. In this paper, we present InTime, a machine-learning approach supported by a cloud-based (or cluster-based) compilation infrastructure for automating the selection of these parameters effectively to minimize timing costs. InTime builds a database of results from a series of preliminary runs based on canned configurations of CAD options. It then learns from these runs to predict the next series of CAD tool options to improve timing results. Towards the end, we rely on a limited degree of statistical sampling of certain options like placer and synthesis seeds to further tighten results. Using our approach, we show 70% reduction in final timing results across industrial benchmark problems for the Altera CAD flow. This is 30% better than vendor-supplied design space exploration tools that attempts a similar optimization using canned heuristics.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132961097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays 2015 ACM/SIGDA现场可编程门阵列国际研讨会论文集

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 1900-01-01 DOI: 10.1145/2684746

引用次数: 2