{"title":"Building zynq® accelerators with Vivado® high level synthesis","authors":"S. Neuendorffer, F. Martinez-Vallina","doi":"10.1145/2435264.2435266","DOIUrl":"https://doi.org/10.1145/2435264.2435266","url":null,"abstract":"Engineering complex systems inevitably requires a designer to balance many conflicting design requirements including performance, cost, power, and design time. In many cases, FPGAs enable engineers to balance these design requirements in ways not possible with other technologies like ASICs, ASSPs, GPUs or general purpose processors. This tutorial will focus on two of the newest commercial FPGA-related technologies, High Level Synthesis (HLS) and Programmable Logic integrated tightly with high performance embedded processors. In particular, we will present a detailed introduction to Vivado HLS, which is capable of synthesizing optimized FPGA circuits from algorithmic descriptions in C, C++ and SystemC. We will also present an introduction to the architecture of Zynq devices and show how interesting system architectures can be constructed using High Level Synthesis and the programmable logic portion of these devices.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"16 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79252300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Precision fault injection method based on correspondence between configuration bitstream and architecture (abstract only)","authors":"Jing Zhou, Lei Chen, Shuo Wang","doi":"10.1145/2435264.2435317","DOIUrl":"https://doi.org/10.1145/2435264.2435317","url":null,"abstract":"SRAM-based FPGAs are increasingly being used; however they are susceptible to SEUs. To emulate the effects of SEUs, a variety of fault injection techniques have been studied. As fault injection process helps little to SEU mechanism study. For further study, a novel Automated Precision Fault Injection System (APFIS) has been developed by Beijing Microelectronics Technology Institute (BMTI), which is engaged in the design, test, package, failure analysis of the Large-scale integration (LSI) and Very Large Scale Integration (VLSI). However, the APFIS is not precise enough. As a result, a more accurate precision fault injection method is studied in this paper. The Automated Precision Fault Injection System-II (APFIS-II) based on this method is made. As early Xilinx devices are still used in special applications without such useful tools, which allowing users to optimize their design conveniently. In this paper, APFIS-II is implemented with Virtex device to improve the reliability of system which contains early devices. The detailed information about the FPGA architecture and configuration bitstream is analyzed. After that, the correspondence between the FPGA resources on-chip and the configuration bitstream is drawn. According to the corresponding relationship, the bitstream is divided into several segments. By APFIS-II, faults are accurately injected into a certain segment instead of the entire bitstream. As a result, faults are able to be injected into a certain resource on-chip. Through this method, the fault injection process is more effective and more targeted, which helps a lot to the study of SEU mechanism and the mitigation techniques.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"27 1","pages":"267"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78672373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A memory-efficient hardware architecture for real-time feature detection of the SIFT algorithm (abstract only)","authors":"Wenjuan Deng, Yiqun Zhu","doi":"10.1145/2435264.2435332","DOIUrl":"https://doi.org/10.1145/2435264.2435332","url":null,"abstract":"The SIFT (Scale Invariant Feature Transform) is a most popular image processing algorithm that has been widely used in solving image matching related problems. However, SIFT is of high computational complexity and large memory requirement that prevent it from being applied to applications that are unable to offer large on-chip memory. Based on the analysis of the memory requirement of SIFT feature detection, a novel memory access strategy is proposed to reduce the hardware memory usage. In addition, to achieve real-time performance of high resolution video streams, dedicated hardware architecture with multi-pixel based processing scheme has been developed. Compared with conventional designs, our design achieves hardware memory reduction of at least 58.8%.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"69 1","pages":"273"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86108196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sezer Gören, Yusuf Turk, Ozgur Ozkurt, Abdullah Yildiz, H. F. Ugurdag
{"title":"Achieving modular dynamic partial reconfiguration with a difference-based flow (abstract only)","authors":"Sezer Gören, Yusuf Turk, Ozgur Ozkurt, Abdullah Yildiz, H. F. Ugurdag","doi":"10.1145/2435264.2435324","DOIUrl":"https://doi.org/10.1145/2435264.2435324","url":null,"abstract":"Dynamic Partial Reconfiguration (DPR) of Xilinx FPGAs in cases where there is significant logic difference between subsequent configurations is made possible by Xilinx module-based PR flow. Xilinx supports this flow only for high-end FPGAs and requires paid license, without which Xilinx PlanAhead software disables the related knobs and features. This poster presents a unique methodology (called DPR-LD) that enables DPR of low-end and high-end Xilinx FPGAs and requires no paid license. DPR-LD stands for DPR for Large Differences. DPR-LD uses the free Xilinx difference-based bit file generation software (bitgen), which normally is meant only for small differences between subsequent configurations. DPR-LD can be realized through either FPGA Editor or PlanAhead. Our FPGA Editor flow requires several physical constraints to ensure contention-free implementation of static and dynamic modules. We use implementation, floorplanning, and placement constraints to partition the design into several physical regions (one per module) for mapping, packing, placement, and routing. In order to avoid routing of a module to cross over another module, \"fortress block\"s are used to isolate the modules from each other. However, fortress blocks lead to wasted FPGA resources. On the other hand, in our PlanAhead flow, the physical constraints are entered via a GUI, and the corresponding actual physical constraints are generated automatically and without wasting FPGA resources. To evaluate the two approaches, a proof-of-concept application with a single dynamic region was implemented using both flows. In addition, a multiple dynamic region design was implemented with our PlanAhead flow.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"59 1","pages":"270"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85227297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing the power of FPGAs using altera's OpenCL compiler","authors":"Deshanand P. Singh, Tomasz S. Czajkowski, A. Ling","doi":"10.1145/2435264.2435268","DOIUrl":"https://doi.org/10.1145/2435264.2435268","url":null,"abstract":"In recent years, Field-Programmable Gate Arrays have become extremely powerful computational platforms that can efficiently solve many complex problems. The most modern FPGAs comprise effectively millions of programmable elements, signal processing elements and high-speed interfaces, all of which are necessary to deliver a complete solution. The power of FPGAs is unlocked via low-level programming languages such as VHDL and Verilog, which allow designers to explicitly specify the behavior of each programmable element. While these languages provide a means to create highly efficient logic circuits, they are akin to \"assembly language\" programming for modern processors. This is a serious limiting factor for both productivity and the adoption of FPGAs on a wider scale. In this talk, we use the OpenCL language to explore techniques that allow us to program FPGAs at a level of abstraction closer to traditional software-centric approaches. OpenCL is an industry standard parallel language based on 'C' that offers numerous advantages that enable designers to take full advantage of the capabilities offered by FPGAs, while providing a high-level design entry language that is familiar to a wide range of programmers.\u0000 To demonstrate the advantages a high-level programming language can offer, we demonstrate how to use Altera's OpenCL Compiler on a set of case studies. The first application is single-precision general-element matrix multiplication (SGEMM). It is an example of a highly-parallel algorithm for which an efficient circuit structures are well known. We show how this application can be implemented in OpenCL and how the high-level description can be optimized to generate the most efficient circuit in hardware. The second application is a Fast Fourier Transform (FFT), which is a classical FPGA benchmark that is known to have a good implementation on FPGAs. We show how we can implement the FFT algorithm, while exploring the many different possible architectural choices that lead to an optimized implementation for a given FPGA. Finally, we discuss a Monte-Carlo Black-Scholes simulation, which demonstrates the computational power of FPGAs. We describe how a random number generator in conjunction with computationally intensive operations can be harnessed on an FPGA to generate a high-speed benchmark, which also consumes far less power than the same benchmark running on a comparable GPU. We conclude the tutorial with a set of live demonstrations.\u0000 Through this tutorial we show the benefits high-level languages offer for system-level design and productivity. In particular, Altera's OpenCL compiler is shown to enable high-performance application design that fully utilizes capabilities of modern FPGAs.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"277 1","pages":"5-6"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91539283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A remote memory access infrastructure for global address space programming models in FPGAs","authors":"Ruediger Willenberg, P. Chow","doi":"10.1145/2435264.2435301","DOIUrl":"https://doi.org/10.1145/2435264.2435301","url":null,"abstract":"We are proposing a shared-memory communication infrastructure that provides a common parallel programming interface for FPGA and CPU components in a heterogeneous system. Our intent is to ease the integration of reconfigurable hardware into parallel programming models like Partitioned Global Address Space (PGAS). For this purpose, we introduce a remote memory access component based on Active Messages that implements the core API of the Berkeley GASNet communication library, and a simple controller that manages communication and synchronization for custom FPGA cores. We demonstrate how these components deliver a simple and easily configurable communication mechanism between distributed memories in a multi-FPGA system with processors as well as custom hardware nodes.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"87 1","pages":"211-220"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83792510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low power FPGA design using post-silicon device aging (abstract only)","authors":"Sheng Wei, J. Zheng, M. Potkonjak","doi":"10.1145/2435264.2435340","DOIUrl":"https://doi.org/10.1145/2435264.2435340","url":null,"abstract":"The impact of process variation (PV) in deep submicron CMOS technologies has raised major concerns for energy optimization efforts in FPGAs. We have developed a post-silicon leakage energy optimization scheme that raises the threshold voltage (by way of negative bias temperature instability (NBTI) aging) of the components that are either unused or not on the critical timing paths, thereby reducing the total leakage energy consumption. In order to obtain the input vectors for aging only the targeted transistors, we map the problem of minimizing leakage energy under timing constraints to an instance of the satisfiability (SAT) problem. We implemented low power designs targeting Xilinx Spartan6 FPGAs and analyzed the potential leakage power savings over a set of ITC99 and Opencores benchmarks. The analysis of the experimental results shows a substantial amount of potential leakage energy reduction with very small performance degradation.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"1998 1","pages":"277"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78616961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Wang, Xi Li, Xuehai Zhou, Jim Martin, R. Cheung
{"title":"Genome sequencing using mapreduce on FPGA with multiple hardware accelerators (abstract only)","authors":"Chao Wang, Xi Li, Xuehai Zhou, Jim Martin, R. Cheung","doi":"10.1145/2435264.2435313","DOIUrl":"https://doi.org/10.1145/2435264.2435313","url":null,"abstract":"The genome sequencing problem with short reads is an emerging field with seemingly limitless possibilities for advances in numerous scientific research and application domains. It has been the hot topic during the past few years. Growing with the data population and the ease to access for personal users, how to shorten the response interval for short read mapping at a large scale computing domain is extremely important. In this paper we propose a novel FPGA-based acceleration solution with Map-Reduce framework on multiple hardware acceleration engines. The combination of hardware accelerators and Map-Reduce execution flow could greatly expedite the task of aligning short length reads to a known reference genome. Our approach is based on preprocessing the reference genomes and iterative jobs for aligning the continuous incoming reads. The read-mapping algorithm is modeled after the creditable RMAP software approach. Furthermore, theoretical speedup analysis on a MapReduce programming platform is presented, which demonstrates that our proposed architecture has efficient potential to reduce the average waiting time for large scale short reads applications.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"18 2","pages":"266"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72570827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Hioki, T. Sekigawa, T. Nakagawa, H. Koike, Y. Matsumoto, Takashi Kawanami, T. Tsutsumi
{"title":"Fully-functional FPGA prototype with fine-grain programmable body biasing","authors":"M. Hioki, T. Sekigawa, T. Nakagawa, H. Koike, Y. Matsumoto, Takashi Kawanami, T. Tsutsumi","doi":"10.1145/2435264.2435280","DOIUrl":"https://doi.org/10.1145/2435264.2435280","url":null,"abstract":"A fully-functional FPGA prototype chip in which the programmable body bias voltage can be individually applied to elemental circuits such as MUXes, LUT and DFF is fabricated using low-power 90-nm bulk CMOS technology and the area overhead, dynamic current, static current and operational speed are evaluated in silicon. In measurements, 10 ISCAS benchmark circuits are implemented by employing newly developed CAD tools which consist of VT mapper as well as placer and router. Mask layout shows that well-separated margins, programmable body bias circuits, and additional configuration memories occupy 54% of the FPGA tile area. Measurement results show that the fabricated FPGA reduces the static current by 91.4% in average. In addition, evaluations by implementing ring oscillator with various body bias voltage pairs demonstrate the static current reduction from 23.1 uA to 1.0 uA by assigning low threshold voltage and high threshold voltage to MOSFETs on a critical path and the rest of the MOSFETs, respectively while maintaining the same oscillation frequency of 6.6 MHz as the frequency when all MOSFETs are assigned low threshold voltage. Moreover the fine-grain programmable body bias technique accelerates the oscillation frequency of ring oscillator implemented on FPGA by aggressively applying forward body bias voltage, while assignment of HVT to MOSFETs on the non-critical path by applying the reverse body biasing effectively suppresses exponential increase of static current caused by the forward body biasing.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"10 1","pages":"73-80"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72735824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Wang, Xi Li, Huizhen Zhang, J. Ji, Xuehai Zhou
{"title":"Custom instruction generation and mapping for reconfigurable instruction set processors (abstract only)","authors":"Chao Wang, Xi Li, Huizhen Zhang, J. Ji, Xuehai Zhou","doi":"10.1145/2435264.2435318","DOIUrl":"https://doi.org/10.1145/2435264.2435318","url":null,"abstract":"Reconfigurable instruction set processors (RISP) is an emerging research field for state-of-the-art adaptive systems. However, it still poses significant challenges to generate and map the custom instructions to the original codes. This paper proposes a generation and mapping scheme to extend custom instructions for adaptive RISP. First a target function blocks (basic blocks) are generated from a dynamic profiler. Then the selected hot spot will be considered as a custom instruction and implemented in reconfigurable hardware logic units. With respect to the instruction selection, an instruction generator is utilized to provide a mapping mechanism from hot blocks to hardware implementations, using data flow analysis, instruction clustering, subgraph enumerating and subgraph merging techniques. Finally the original executable files are recompiled and regenerated by a customized GCC compiler. To demonstrate the effectiveness and performance of the framework, a prototype instruction generator has been implemented to verify the correctness and efficiency of the mapping mechanism.","PeriodicalId":87257,"journal":{"name":"FPGA. ACM International Symposium on Field-Programmable Gate Arrays","volume":"110 1","pages":"268"},"PeriodicalIF":0.0,"publicationDate":"2013-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80552569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}