Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays最新文献_第5页

RapidPath: Accelerating Constrained Shortest Path Finding in Graphs on FPGA (Abstract Only) RapidPath:在FPGA上加速图的约束最短路径查找(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689135

Chao Wang, Xi Li, Qi Guo, Xuehai Zhou

{"title":"RapidPath: Accelerating Constrained Shortest Path Finding in Graphs on FPGA (Abstract Only)","authors":"Chao Wang, Xi Li, Qi Guo, Xuehai Zhou","doi":"10.1145/2684746.2689135","DOIUrl":"https://doi.org/10.1145/2684746.2689135","url":null,"abstract":"Emerging applications, such as Software Defined Network (SDN), Social Media, and Location Based System (LBS), are typical big graph based applications. Due to the explosive network flood, it is essential to speedup the computation process in the big graph application, such as Constrained Shortest Path Finding (CSPF) algorithm is one of the most challenging part. Meanwhile, FPGA has been an effective and efficient platform in novel big data architectures and systems, due to its computing power and low power consumption. It enables the researchers to deploy massive accelerators within one single chip. In this paper, we present RapidPath, an acceleration method for CSPF algorithm in software defined networks, which decomposes a large and complex system of programs into small single-purpose source code libraries that perform specialized tasks in parallel. Only the CSPF step is implemented in hardware and the rest steps run on the processor. We have built a prototyping system on Zynq with CSPF case studies. The ARM processor uses a shared memory with the FPGA based accelerator using DMA based channels. Control signals are transferred via AXI bus interfaces. Experimental results depict that RapidPath is able to achieve up to 43.75X speedup at 128 nodes, comparing to the software execution (without cache) on Xilinx Zynq board. Furthermore, hardware cost and overheads reveal that the RapidPath architecture can achieve high speedup with insignificant cost.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132966477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Session details: Keynote Speech 会议详情:主题演讲

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/3251649

S. Neuendorffer

引用次数: 0

Resource-Aware Throughput Optimization for High-Level Synthesis 面向高级合成的资源感知吞吐量优化

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689065

Peng Li, Peng Zhang, L. Pouchet, J. Cong

{"title":"Resource-Aware Throughput Optimization for High-Level Synthesis","authors":"Peng Li, Peng Zhang, L. Pouchet, J. Cong","doi":"10.1145/2684746.2689065","DOIUrl":"https://doi.org/10.1145/2684746.2689065","url":null,"abstract":"With the emergence of robust high-level synthesis tools to automatically transform codes written in high-level languages into RTL implementations, the programming productivity when synthesising accelerators improves significantly. However, although the state-of-the-art high-level synthesis tools can offer high-quality designs for simple nested loop kernels, there is still a significant performance gap between the synthesized and the optimal design for real world complex applications with multiple loops. In this work we first demonstrate that maximizing the throughput of each individual loop is not always the most efficient approach to achieving the maximum system-level throughput. More area efficient non-fully pipelined design variants may outperform the fully-pipelined version by enabling larger degrees of parallelism. We develop an algorithm to determine the optimal resource usage and initiation intervals for each loop in the applications to achieve maximum throughput within a given area budget. We report experimental results on eight applications, showing an average of 31% performance speedup over state-of-the-art HLS solutions.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125941388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Customizable and High Performance Matrix Multiplication Kernel on FPGA (Abstract Only) 基于FPGA的可定制高性能矩阵乘法内核(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689147

Jie Wang, J. Cong

引用次数: 2

200 MS/s ADC implemented in a FPGA employing TDCs 在采用tdc的FPGA中实现200 MS/s ADC

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689070

H. Homulle, F. Regazzoni, E. Charbon

{"title":"200 MS/s ADC implemented in a FPGA employing TDCs","authors":"H. Homulle, F. Regazzoni, E. Charbon","doi":"10.1145/2684746.2689070","DOIUrl":"https://doi.org/10.1145/2684746.2689070","url":null,"abstract":"Analog signals are used in many applications and systems, such as cyber physical systems, sensor networks and automotive applications. These are also applications where the use of FPGAs is continuously growing. To date, however there is no direct integration between FPGAs, which are digital, and the analog world (except for the newest generation of FPGAs). Currently, an external analog-to-digital converter (ADC) has to be added to the system, thus limiting its overall compactness and flexibility. To address this issue we propose a novel architecture implementing a high speed ADC in reconfigurable devices. The system exploits picosecond resolution time-to-digital converters (TDCs) to reach a conversion as fast as its clock speed. The resulting analog-through-time-to-digital converter (ATDC) can achieve a sampling rate of 200 MS/s with a 7 bit resolution for signals ranging from 0 to 2.5 V. Except for the external resistor needed for the analog reference ramp, the system is fully integrated inside the target FPGA. Moreover, our design can be easily scaled for multichannel ADCs, proving the suitability of reconfigurable devices for applications requiring a deep integration between analog and digital world.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122990766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A Novel Coefficient Address Generation Algorithm for Split-Radix FFT (Abstract Only) 一种新的分基FFT系数地址生成算法(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689134

Z. Qian, M. Margala

{"title":"A Novel Coefficient Address Generation Algorithm for Split-Radix FFT (Abstract Only)","authors":"Z. Qian, M. Margala","doi":"10.1145/2684746.2689134","DOIUrl":"https://doi.org/10.1145/2684746.2689134","url":null,"abstract":"Split-Radix Fast Fourier Transform (SRFFT) has the lowest number of arithmetic operations among all the FFT algorithms. Since arithmetic operations dramatically contribute to the dynamic power consumption, SRFFT is an ideal candidate for the implementation of a low power FFT processor. In the design of such processors, an efficient addressing scheme for FFT data as well as coefficients is required. The signal flow graph of split-radix algorithm is the same as radix-2 FFT except for the location and value of coefficients, therefore conventional radix-2 FFT data address generation scheme could also be applied to SRFFT. However, the mixed radix property of SRFFT algorithm leads to irregular locations of coefficients and forbids any conventional address generation algorithm. This paper presents a novel coefficient address generation algorithm for shared-memory based SRFFT processor. The core part of the proposed algorithm is to use two control variables to track trivial and non-trivial multiplications. We found the relationship between the value of the control variables and the butterfly and pass counter. The corresponding hardware implementation is simple consisting of a shift register and a dual port RAM bank. Compared to look-up table approach, which pre-computes the addresses of all coefficients and stores the addresses in memory units, the proposed algorithm is scalable and only requires small amount of memory to find the correct addresses of coefficients.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131712523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Expanding OpenFlow Capabilities with Virtualized Reconfigurable Hardware 使用虚拟化可重构硬件扩展OpenFlow功能

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689086

Stuart Byma, Naif Tarafdar, T. Xu, H. Bannazadeh, A. Leon-Garcia, P. Chow

引用次数: 11

Session details: Technical Session 4: Architecture 2: Memory Systems 会议详情:技术会议4:架构2:内存系统

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/3251653

C. Ebeling

引用次数: 0

Session details: Technical Session 1: Computer-aided Design 技术会议1:计算机辅助设计

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/3251650

H. Schmit

引用次数: 0

Silicon Verification using High-Level Design Tools (Abstract Only) 使用高级设计工具进行硅验证(仅摘要)

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2015-02-22 DOI: 10.1145/2684746.2689131

Tomasz S. Czajkowski

{"title":"Silicon Verification using High-Level Design Tools (Abstract Only)","authors":"Tomasz S. Czajkowski","doi":"10.1145/2684746.2689131","DOIUrl":"https://doi.org/10.1145/2684746.2689131","url":null,"abstract":"Modern FPGAs comprise ever more complex blocks to enable a wide variety of customer applications. Verification of the complex blocks can be a time consuming process, especially at the late stages of the release cycle. A key challenge is the time it takes to create circuits that can run on a target device to test a given block. This paper demonstrates how High-Level Design tools, such as Altera SDK for OpenCL, can be utilized to aid in this work to verify the operation of complex hardened blocks. As a proof of concept, we present the methodology used to verify the correctness of hardened single-precision floating point adder, subtractor and multiplier units on Altera Arria 10 FPGA in a single day. Each design comprised an instance of a hardened floating point unit, either an adder, subtractor or a multiplier, and a functional equivalent there of implemented purely using Lookup Tables (LUTs). Both the hardened module instance and the LUT implementation were generated from OpenCL description using Altera SDK for OpenCL. The results for each computation were compared between the two implementations and any single discrepancy constituted a test failure. To simplify the test, the I/O for each design comprised LEDs (for pass/fail/running/done status) and two switches -- start and reset. The test design for adder, subtractor and a multiplier were all written in OpenCL, the compilation of each design took approximately 30 minutes for each test design. Each design tested 4 billion test vectors, generated on-chip using a Mersenne Twister, and each test completed within 30 seconds. All tests passed verification in hardware.","PeriodicalId":388546,"journal":{"name":"Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"110 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123335735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1