Michaela Amoo, Youngsoo Kim, Vance Alford, Shrikant S. Jadhav, Naser El-Bathy, C. Gloster
{"title":"An Automated Design Framework for Floating Point Scientific Algorithms using Field Programmable Gate Arrays (FPGAs) (Abstract Only)","authors":"Michaela Amoo, Youngsoo Kim, Vance Alford, Shrikant S. Jadhav, Naser El-Bathy, C. Gloster","doi":"10.1145/2684746.2689101","DOIUrl":"https://doi.org/10.1145/2684746.2689101","url":null,"abstract":"This paper presents a reconfigurable computing environment while addressing the problem of porting High Performance Computing (HPC) applications directly to Field Programmable Gate Arrays (FPGAs)-based architectures. The objectives of this research are developing a comprehensive floating point library of essential functions for scientific applications; demonstrate order of magnitude speedup of reconfigurable computing applications, demonstrating the effectiveness of automated design framework for both development and test of scientific algorithms. The developed framework can be reused in various scientific applications which shares kernel functions. The study of this research has identified an exponential function as a kernel for cellular ophthalmoscopy camera processing, traffic monitoring and light wave simulation. The paper demonstrates 30x speedup of these kernels in three algorithms using its novel architecture and its automated toolset. Exponential kernel generation case study and its flexible hardware implementation on an FPGA has been validated onto a Xilinx LX-100 device and the Nallatech H101-PCIXM FPGA board.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132055680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Source-Level Transformations to Improve High-Level Synthesis Debug and Validation on FPGAs","authors":"Joshua S. Monson, B. Hutchings","doi":"10.1145/2684746.2689087","DOIUrl":"https://doi.org/10.1145/2684746.2689087","url":null,"abstract":"This paper proposes a method for extending source-level visibility into the RTL of an HLS-generated design using automated source-level transformations. Using our method, source-level visibility can be extended into co-simulation, in-system simulation, and hardware execution of any HLS tool that provides the ability to infer top-level ports. Experimental results show the feasibility of our method in situations where visibility needs to be added without modifying the timing, latency, or throughput of the design.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129716607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Logic Gates in the routing network of FPGAs (Abstract Only)","authors":"Elias Vansteenkiste, Berg Severens, D. Stroobandt","doi":"10.1145/2684746.2689098","DOIUrl":"https://doi.org/10.1145/2684746.2689098","url":null,"abstract":"We propose a new kind of FPGA architecture with a routing network that not only provides interconnections between the functional blocks but also performs some logic operation. More specifically we replaced the routing multiplexer node in the conventional architecture with an element that can be used as both AND gate and multiplexer. A conventional routing multiplexer node consists of a multiplexer and a two stage buffer. In our new architecture a NAND gate replaces the first inverter stage of the buffer and two multiplexers half the size of the original multiplexer replace the original multiplexer. The aim of this study is to determine if this kind of architecture is feasible and if it is worth to implement pack, placement and routing tools in the future. We developed a new technology-mapping algorithm and sized the transistors in this new architecture to evaluate the area and delay. Preliminary results indicate that the gain in logic depth and area achieved by mapping to not only LUTs but also to AND gates outweighs the overhead of introducing AND gates in the routing network with a net reduction in area-delay product of 5.6. Designs implemented on the proposed architecture would require 11.2 % more area, but they will have a 14 % decreased logic depth and the architecture has a slightly faster representative critical path. These results are preliminary because the pack, place and route routines are not implemented yet.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130597125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized Fixed-Point FPGA Implementation of SVPWM for a Two-Level Inverter (Abstract Only)","authors":"D. Mohammadi, S. Ahmed-Zaid, N. Rafla","doi":"10.1145/2684746.2689144","DOIUrl":"https://doi.org/10.1145/2684746.2689144","url":null,"abstract":"This paper presents an optimized fixed-point implementation of space-vector pulse-width modulation (SVPWM) for a two-level inverter. Bit-width fixed-point signals as well as circuit area are minimized by meeting the desired design accuracy. Most of the designs currently available are specified in floating-point precision to speed the process of simulating their functionality. However, area-optimized hardware implementation of these algorithms requires fixed-point precision. A generic function is used to formulate the precision required for each signal to get the proper accuracy. A non-convex optimization problem is solved for the number of required bit-widths for the signals. This solution has been simulated and implemented on FPGA to verify the resulting accuracy.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121066466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a Loeffler DCT using Xilinx Vivado HLS (Abstract Only)","authors":"Seung Yeol Baik, S. Jeong, H. Oh","doi":"10.1145/2684746.2694735","DOIUrl":"https://doi.org/10.1145/2684746.2694735","url":null,"abstract":"Loeffler discrete cosine transform (DCT) algorithm is recognized as the most efficient one because it requires the theoretically least number of multiplications. However, many applications still encounter difficulty in performing the 11 multiplications required by the algorithm to calculate a 1D eight-point DCT. To avoid expensive multipliers in the hardware, we used two design methods, namely, distributed arithmetic (DA) and shift-and-add (SAA) methods, to design the DCT accelerator. The memory bandwidth is 60 bits: 24 bits for reads of the R(red), G(green), and B(blue) data of a pixel and 36 bits for writes of three corresponding 12-bit DCT coefficients. Thus, the 1D eight-point DCT accelerator for each of R, G, and B can have one 12-bit input port and one 12-bit output port so that it can calculate a 2D DCT by row-column decomposition method. The designs are adjusted to produce the same latency and interval. DA seems promising because Loeffler DCT requires only three small tables with four input bits. However, our experiments using Xilinx Vivado HLS show that the SAA design is better than the DA design for the considered applications. Furthermore, simulation results suggest that the optimal accelerator design can be obtained by adjusting the SAA design to the considered applications. The resultant SAA design requires only 13 adders (per color component) and can calculate one DCT coefficient per clock cycle. The precision of the internal hardware has been adjusted, such that the reconstructed images have PSNR values of at least 39.1 dB for all test images (Lenna, Pepper, House, and Cameraman). If a precision of 13bits is allowed, PSNR becomes at least 44.8 dB. Our presentation describes the architecture and operation of the optimized SAA design.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125372408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"InTime: A Machine Learning Approach for Efficient Selection of FPGA CAD Tool Parameters","authors":"Nachiket Kapre, Harnhua Ng, K. Teo, J. Naude","doi":"10.1145/2684746.2689081","DOIUrl":"https://doi.org/10.1145/2684746.2689081","url":null,"abstract":"FPGA CAD tool parameters controlling synthesis optimizations, place and route effort, mapping criteria along with user-supplied physical constraints can affect timing results of the circuit by as much as 70% without any change in original source code. A correct selection of these parameters across a diverse set of benchmarks with varying characteristics and design goals is challenging. The sheer number of parameters and option values that can be selected is large (thousands of combinations for modern CAD tools) with often conflicting interactions. In this paper, we present InTime, a machine-learning approach supported by a cloud-based (or cluster-based) compilation infrastructure for automating the selection of these parameters effectively to minimize timing costs. InTime builds a database of results from a series of preliminary runs based on canned configurations of CAD options. It then learns from these runs to predict the next series of CAD tool options to improve timing results. Towards the end, we rely on a limited degree of statistical sampling of certain options like placer and synthesis seeds to further tighten results. Using our approach, we show 70% reduction in final timing results across industrial benchmark problems for the Altera CAD flow. This is 30% better than vendor-supplied design space exploration tools that attempts a similar optimization using canned heuristics.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132961097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","authors":"","doi":"10.1145/2684746","DOIUrl":"https://doi.org/10.1145/2684746","url":null,"abstract":"","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130699748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}